Methods and compositions for polypeptide engineering

ABSTRACT

Methods are provided for the evolution of proteins of industrial and pharmaceutical interest, including methods for effecting recombination and selection. Compositions produced by these methods are also disclosed.

This application is a continuation-in-part of U.S. patent applicationSer. No. 08/198,431, filed Feb. 17, 1994, Serial No. PCT/US95/02126,filed, Feb. 17, 1995, Ser. No. 08/425,684, filed Apr. 18, 1995, Ser. No.08/537,874, filed Oct. 30, 1995, Ser. No. 08/564,955, filed Nov. 30,1995, Ser. No. 08/621,859, filed Mar. 25, 1996, Ser. No. 08/621,430,filed Mar. 25, 1996, Serial No. PCT/US96/05480, filed Apr. 18, 1996,Ser. No. 08/650,400, filed May 20, 1996, Ser. No. 08/675,502, filed Jul.3, 1996, Ser. No. 08/721,824, filed Sep. 27, 1996, and Ser. No.08/722,660 filed Sep. 27, 1996 the specifications of which are hereinincorporated by reference in their entirety for all purposes.

BACKGROUND OF THE INVENTION

Recursive sequence recombination entails performing iterative cycles ofrecombination and screening or selection to “evolve” individual genes,whole plasmids or viruses, multigene clusters, or even whole genomes(Stemmer, Bio/Technology 13:549-553 (1995)). Such techniques do notrequire the extensive analysis and computation required by conventionalmethods for polypeptide engineering. Recursive sequence recombinationallows the recombination of large numbers of mutations in a minimumnumber of selection cycles, in contrast to traditional, pairwiserecombination events.

Thus, recursive sequence recombination (RSR) techniques provideparticular advantages in that they provide recombination betweenmutations in any or all of these, thereby providing a very fast way ofexploring the manner in which different combinations of mutations canaffect a desired result.

In some instances, however, structural and/or functional information isavailable which, although not required for recursive sequencerecombination, provides opportunities for modification of the technique.In other instances, selection and/or screening of a large number ofrecombinants can be costly or time-consuming. A further problem can bethe manipulation of large nucleic acid molecules. The instant inventionaddresses these issues and others.

SUMMARY OF THE INVENTION

One aspect of the invention is a method for evolving a protein encodedby a DNA substrate molecule comprising:

(a) digesting at least a first and second DNA substrate molecule,wherein the at least a first and second substrate molecules differ fromeach other in at least one nucleotide, with a restriction endonuclease;

(b) ligating the mixture to generate a library recombinant DNAmolecules;

(c) screening or selecting the products of (b) for a desired property;and

(d) recovering a recombinant DNA substrate molecule encoding an evolvedprotein.

A further aspect of the invention is a method for evolving a proteinencoded by a DNA substrate molecule by recombining at least a first andsecond DNA substrate molecule, wherein the at least a first and secondsubstrate molecules differ from each other in at least one nucleotideand comprise defined segments, the method comprising:

(a) providing a set of oligonucleotide PCR primers, comprising at leastone primer for each segment, wherein the primer sequence iscomplementary to at least one junction with another segment;

(b) amplifying the segments of the at least a first and second DNAsubstrate molecules with the primers of step (a) in a polymerase chainreaction;

(c) assembling the products of step (b) to generate a library ofrecombinant DNA substrate molecules;

(d) screening or selecting the products of (c) for a desired property;and

(e) recovering a recombinant DNA substrate molecule from (d) encoding anevolved protein.

A further aspect of the invention is a method of enriching a populationof DNA fragments for mutant sequences comprising:

(a) denaturing and renaturing the population of fragments to generate apopulation of hybrid double-stranded fragments in which at least onedouble-stranded fragment comprises at least one base pair mismatch;

(b) fragmenting the products of (a) into fragments of about 20-100 bp;

(c) affinity-purifying fragments having a mismatch on an affinity matrixto generate a pool of DNA fragments enriched for mutant sequences; and

(d) assembling the products of (c) to generate a library of recombinantDNA substrate molecules.

A further aspect of the invention is a method for evolving a proteinencoded by a DNA substrate molecule, by recombining at least a first andsecond DNA substrate molecule, wherein the at least a first and secondsubstrate molecules share a region of sequence homology of about 10 to100 base pairs and comprise defined segments, the method comprising:

(a) providing regions of homology in the at least a first and second DNAsubstrate molecules by inserting an intron sequence between at least twodefined segments;

(b) fragmenting and recombining DNA substrate molecules of (a), whereinregions of homology are provided by the introns;

(c) screening or selecting the products of (b) for a desired property;and

(d) recovering a recombinant DNA substrate molecule from the products of(c) encoding an evolved protein.

A further aspect of the invention is a method for evolving a proteinencoded by a DNA substrate molecule by recombining at least a first andsecond DNA substrate molecule, wherein the at least a first and secondsubstrate molecules differ from each other in at least one nucleotideand comprise defined segments, the method comprising:

(a) providing a set of oligonucleotide PCR primers, wherein for eachstrand of each segment a pair of primers is provided, one member of eachpair bridging the junction at one end of the segment and the otherbridging the junction at the other end of the segment, with the terminalends of the DNA molecule having as one member of the pair a genericprimer, and wherein a set of primers is provided for each of the atleast a first and second substrate molecules;

(b) amplifying the segments of the at least a first and second DNAsubstrate molecules with the primers of (a) in a polymerase chainreaction;

(c) assembling the products of (b) to generate a pool of recombinant DNAmolecules;

(d) selecting or screening the products of (c) for a desired property;and

(e) recovering a recombinant DNA substrate molecule from the products of(d) encoding an evolved protein.

A further aspect of the invention is a method for optimizing expressionof a protein by evolving the protein, wherein the protein is encoded bya DNA substrate molecule, comprising:

(a) providing a set of oligonucleotides, wherein each oligonucleotidecomprises at least two regions complementary to the DNA molecule and atleast one degenerate region, each degenerate region encoding a region ofan amino acid sequence of the protein;

(b) assembling the set of oligonucleotides into a library of full lengthgenes;

(c) expressing the products of (b) in a host cell;

(d) screening the products of (c) for improved expression of theprotein; and

(e) recovering a recombinant DNA substrate molecule encoding an evolvedprotein from (d).

A further aspect of the invention is a method for optimizing expressionof a protein encoded by a DNA substrate molecule by evolving theprotein, wherein the DNA substrate molecule comprises at least one lacoperator and a fusion of a DNA sequence encoding the protein with a DNAsequence encoding a lac headpiece dimer, the method comprising:

(a) transforming a host cell with a library of mutagenized DNA substratemolecules;

(b) inducing expression of the protein encoded by the library of (a);

(c) preparing an extract of the product of (b);

(d) fractionating insoluble protein from complexes of soluble proteinand DNA; and

(e) recovering a DNA substrate molecule encoding an evolved protein from(d).

A further aspect of the invention is a method for evolving functionalexpression of a protein encoded by a DNA substrate molecule comprising afusion of a DNA sequence encoding the protein with a DNA sequenceencoding filamentous phage protein to generate a fusion protein, themethod comprising:

(a) providing a host cell producing infectious particles expressing afusion protein encoded by a library of mutagenized DNA substratemolecules;

(b) recovering from (a) infectious particles displaying the fusionprotein;

(c) affinity purifying particles displaying the mutant protein using aligand for the protein; and

(d) recovering a DNA substrate molecule encoding an evolved protein fromaffinity purified particles of (c).

A further aspect of the invention is a method for optimizing expressionof a protein encoded by a DNA substrate molecule comprising a fusion ofa DNA sequence encoding the protein with a lac headpiece dimer, whereinthe DNA substrate molecule is present on a first plasmid vector, themethod comprising:

(a) providing a host cell transformed with the first vector and a secondvector comprising a library of mutants of at least one chaperonin gene,and at least one lac operator;

(b) preparing an extract of the product of (a);

(c) fractionating insoluble protein from complexes of soluble proteinand DNA; and

(d) recovering DNA encoding a chaperonin gene from (c).

A further aspect of the invention is a method for optimizing expressionof a protein encoded by a DNA substrate molecule comprising a fusion ofa DNA sequence encoding the protein with a filamentous phage gene,wherein the fusion is carried on a phagemid comprising a library ofchaperonin gene mutants, the method comprising:

(a) providing a host cell producing infectious particles expressing afusion protein encoded by a library of mutagenized DNA substratemolecules;

(b) recovering from (a) infectious particles displaying the fusionprotein;

(c) affinity purifying particles displaying the protein using a ligandfor the protein; and

(d) recovering DNA encoding the mutant chaperonin from affinity purifiedparticles of (c).

A further aspect of the invention is a method for optimizing secretionof a protein in a host by evolving a gene encoding a secretory function,comprising:

(a) providing a cluster of genes encoding secretory functions;

(b) recombining at least a first and second sequence in the gene clusterof (a) encoding a secretory function, the at least a first and secondsequences differing from each other in at least one nucleotide, togenerate a library of recombinant sequences;

(c) transforming a host cell culture with the products of (b), whereinthe host cell comprises a DNA sequence encoding the protein;

(d) subjecting the product of (c) to screening or selection forsecretion of the protein; and

(e) recovering DNA encoding an evolved gene encoding a secretoryfunction from the product of (d).

A further aspect of the invention is a method for evolving an improvedDNA polymerase comprising:

(a) providing a library of mutant DNA substrate molecules encodingmutant DNA polymerase;

(b) screening extracts of cells transfected with (a) and comparingactivity with wild type DNA polymerase;

(c) recovering mutant DNA substrate molecules from cells in (b)expressing mutant DNA polymerase having improved activity over wild-typeDNA polymerase; and

(d) recovering a DNA substrate molecule encoding an evolved polymerasefrom the products of (c).

A further aspect of the invention is a method for evolving a DNApolymerase with an error rate greater than that of wild type DNApolymerase comprising:

(a) providing a library of mutant DNA substrate molecules encodingmutant DNA polymerase in a host cell comprising an indicator gene havinga revertible mutation, wherein the indicator gene is replicated by themutant DNA polymerase;

(b) screening the products of (a) for revertants of the indicator gene;

(c) recovering mutant DNA substrate molecules from revertants; and

(d) recovering a DNA substrate molecule encoding an evolved polymerasefrom the products of (c).

A further aspect of the invention is a method for evolving a DNApolymerase, comprising:

(a) providing a library of mutant DNA substrate molecules encodingmutant DNA polymerase, the library comprising a plasmid vector;

(b) preparing plasmid preparations and extracts of host cellstransfected with the products of (a);

(c) amplifying each plasmid preparation in a PCR reaction using themutant polymerase encoded by that plasmid, the polymerase being presentin the host cell extract;

(d) recovering the PCR products of (c); and

(e) recovering a DNA substrate molecule encoding an evolved polymerasefrom the products of (d).

A further aspect of the invention is a method for evolving ap-nitrophenol phosphonatase from a phosphonatase encoded by a DNAsubstrate molecule, comprising:

(a) providing library of mutants of the DNA substrate molecule, thelibrary comprising a plasmid expression vector;

(b) transfecting a host, wherein the host phn operon is deleted;

(c) selecting for growth of the transfectants of (b) using ap-nitrophenol phosphonatase as a substrate;

(d) recovering the DNA substrate molecules from transfectants selectedfrom (c); and

(e) recovering a DNA substrate molecule from (d) encoding an evolvedphosphonatase.

A further aspect of the invention is a method for evolving a proteaseencoded by a DNA substrate molecule comprising:

(a) providing library of mutants of the DNA substrate molecule, thelibrary comprising a plasmid expression vector, wherein the DNAsubstrate molecule is linked to a secretory leader;

(b) transfecting a host;

(c) selecting for growth of the transfectants of (b) on a complexprotein medium; and

(d) recovering a DNA substrate molecule from (c) encoding an evolvedprotease.

A further aspect of the invention is a method for screening a library ofprotease mutants displayed on a phage to obtain an improved protease,wherein a DNA substrate molecule encoding the protease is fused to DNAencoding a filamentous phage protein to generate a fusion protein,comprising:

(a) providing host cells expressing the fusion protein;

(b) overlaying host cells with a protein net to entrap the phage;

(c) washing the product of (b) to recover phage liberated by digestionof the protein net;

(d) recovering DNA from the product of (c); and

(e) recovering a DNA substrate from (d) encoding an improved protease.

A further aspect of the invention is a method for screening a library ofprotease mutants to obtain an improved protease, the method comprising:

(a) providing a library of peptide substrates, the peptide substratecomprising a fluorophore and a fluorescence quencher;

(b) screening the library of protease mutants for ability to cleave thepeptide substrates, wherein fluorescence is measured; and

(c) recovering DNA encoding at least one protease mutant from (b).

A further aspect of the invention is a method for evolving an alphainterferon gene comprising:

(a) providing a library of mutant alpha interferon genes, the librarycomprising a filamentous phage vector;

(b) stimulating cells comprising a reporter construct, the reporterconstruct comprising a reporter gene under control of an interferonresponsive promoter, and wherein the reporter gene is GFP;

(c) separating the cells expressing GFP by FACS;

(d) recovering phage from the product of (c); and

(e) recovering an evolved interferon gene from the product of (d).

A further aspect of the invention is a method for screening a library ofmutants of a DNA substrate encoding a protein for an evolved DNAsubstrate, comprising:

(a) providing a library of mutants, the library comprising an expressionvector;

(b) transfecting a mammalian host cell with the library of (a), whereinmutant protein is expressed on the surface of the cell;

(c) screening or selecting the products of (b) with a ligand for theprotein;

(d) recovering DNA encoding mutant protein from the products of (c); and

(e) recovering an evolved DNA substrate from the products of (d).

A further aspect of the invention is a method for evolving a DNAsubstrate molecule encoding an interferon alpha, comprising:

(a) providing a library of mutant alpha interferon genes, the librarycomprising an expression vector wherein the alpha interferon genes areexpressed under the control of an inducible promoter;

(b) transfecting host cells with the library of (a);

(c) contacting the product of (b) with a virus;

(d) recovering DNA encoding a mutant alpha interferon from host cellssurviving step (c); and

(e) recovering an evolved interferon gene from the product of (d).

A further aspect of the invention is a method for evolving the stabilityof a protein encoded by a DNA substrate molecule, the DNA substratemolecule comprising a fusion of a DNA sequence encoding the protein witha DNA sequence encoding a filamentous phage protein to generate a fusionprotein, the method comprising:

(a) providing a host cell expressing a library of mutants of the fusionprotein;

(b) affinity purifying the mutants with a ligand for the protein,wherein the ligand is a human serum protein, tissue specific protein, orreceptor;

(c) recovering DNA encoding a mutant protein from the affinity selectedmutants of (b); and

(d) recovering an evolved gene encoding the protein from the product of(c).

A further aspect of the invention is a method for evolving a proteinhaving at least two subunits, comprising:

(a) providing a library of mutant DNA substrate molecules for eachsubunit;

(b) recombining the libraries into a library of single chain constructsof the protein, the single chain construct comprising a DNA substratemolecule encoding each subunit sequence, the subunit sequence beinglinked by a linker at a nucleic acid sequence encoding the aminoterminus of one subunit to a nucleic acid sequence encoding the carboxyterminus of a second subunit;

(c) screening or selecting the products of (B),

(d) recovering recombinant single chain construct DNA substratemolecules from the products of (c);

(e) subjecting the products of (d) to mutagenesis; and

(f) recovering an evolved single chain construct DNA substrate moleculefrom (e).

A further aspect of the invention is a method for evolving the couplingof a mammalian 7-transmembrane receptor to a yeast signal transductionpathway, comprising:

(a) expressing a library of mammalian G alpha protein mutants in a hostcell, wherein the host cell expresses the mammalian 7-transmembranereceptor and a reporter gene, the receptor gene geing expressed undercontrol of a pheromone responsive promoter;

(b) screening or selecting the products of (a) for expression of thereporter gene in the presence of a ligand for the 7-transmembrancereceptor; and

(c) recovering DNA encoding an evolved G alpha protein mutant fromscreened or selected products of (b).

A further aspect of the invention is a method for recombining at least afirst and second DNA substrate molecule, comprising:

(a) transfecting a host cell with at least a first and second DNAsubstrate molecule wherein the at least a first and second DNA substratemolecules are recombined in the host cell;

(b) screening or selecting the products of (a) for a desired property;and

(c) recovering recombinant DNA substrate molecules from (b).

A further aspect of the invention is a method for evolving a DNAsubstrate sequence encoding a protein of interest, wherein the DNAsubstrate comprises a vector, the vector comprising single-stranded DNA,the method comprising:

(a) providing single-stranded vector DNA and a library of mutants of theDNA substrate sequence;

(b) annealing single stranded DNA from the library of (a) to the singlestranded vector DNA of (a);

(c) transforming the products of (b) into a host;

(d) screening the product of (c) for a desired property; and

(e) recovering evolved DNA substrate DNA from the products of (d).

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 depicts the alignment of oligo PCR primers for evolution ofbovine calf intestinal alkaline phosphatase.

FIG. 2 depicts the alignment of alpha interferon amino acid and nucleicacid sequences.

DESCRIPTION OF THE SPECIFIC EMBODIMENTS

The invention provides a number of strategies for evolving polypeptidesthrough recursive recombination methods. In some embodiments, thestrategies of the invention can generally be classified as “coarse grainshuffling” and “fine grain shuffling.” As described in detail below,these strategies are especially applicable in situations where somestructural or functional information is available regarding thepolypeptides of interest, where the nucleic acid to be manipulated islarge, when selection or screening of many recombinants is cumbersome,and so on. “Coarse grain shuffling” generally involves the exchange orrecombination of segments of nucleic acids, whether defined asfunctional domains, exons, restriction endonuclease fragments, orotherwise arbitrarily defined segments. “Fine grain shuffling” generallyinvolves the introduction of sequence variation within a segment, suchas within codons.

Coarse grain and fine grain shuffling allow analysis of variationoccuring within a nucleic acid sequence, also termed “searching ofsequence space.” Although both techniques are meritorious, the resultsare qualitatively different. For example, coarse grain searches areoften better suited for optimizing multigene clusters such as polyketideoperons, whereas fine grain searches are often optimal for optimizing aproperty such as protein expression using codon usage libraries.

The strategies generally entail evolution of gene(s) or segment(s)thereof to allow retention of function in a heterologous cell orimprovement of function in a homologous or heterologous cell. Evolutionis effected generally by a process termed recursive sequencerecombination. Recursive sequence recombination can be achieved in manydifferent formats and permutations of formats, as described in furtherdetail below. These formats share some common principles. Recursivesequence recombination entails successive cycles of recombination togenerate molecular diversity, i.e., the creation of a family of nucleicacid molecules showing substantial sequence identity to each other butdiffering in the presence of mutations. Each recombination cycle isfollowed by at least one cycle of screening or selection for moleculeshaving a desired characteristic. The molecule(s) selected in one roundform the starting materials for generating diversity in the next round.In any given cycle, recombination can occur in vivo or in vitro.Furthermore, diversity resulting from recombination can be augmented inany cycle by applying prior methods of mutagenesis (e.g., error-pronePCR or cassette mutagenesis, passage through bacterial mutator strains,treatment with chemical mutagens) to either the substrates for orproducts of recombination.

I. Formats for Recursive Sequence Recombination

Some formats and examples for recursive sequence recombination,sometimes referred to as DNA shuffling, evolution, or molecularbreeding, have been described by the present inventors and co-workers inco-pending applications U.S. patent application Ser. No. 08/198,431,filed Feb. 17, 1994, Serial No. PCT/US95/02126, filed, Feb. 17, 1995,Ser. No. 08/425,684, filed Apr. 18, 1995, Ser. No. 08/537,874, filedOct. 30, 1995, Ser. No. 08/564,955, filed Nov. 30, 1995, Ser. No.08/621,859, filed Mar. 25, 1996, Ser. No. 08/621,430, filed Mar. 25,1996, Serial No. PCT/US96/05480, filed Apr. 18, 1996, Ser. No.08/650,400, filed May 20, 1996, Ser. No. 08/675,502, filed Jul. 3, 1996,Ser. No. 08/721,824, filed Sep. 27, 1996, and Ser. No. 08/722,660 filedSep. 27, 1996; Stemmer, Science 270:1510 (1995); Stemmer et al., Gene164:49-53, (1995); Stemmer, Bio/Technology 13:549-553 (1995); Stemmer,Proc. Natl. Acad. Sci. U.S.A. 91:10747-10751 (1994); Stemmer, Nature370:389-391 (1994); Crameri et al., Nature Medicine 2(1):1-3 (1996);Crameri et al., Nature Biotechnology 14:315-319 (1996), each of which isincorporated by reference in its entirety for all purposes.

In general, the term “gene” is used herein broadly to refer to anysegment or sequence of DNA associated with a biological function. Genescan be obtained from a variety of sources, including cloning from asource of interest or synthesizing from known or predicted sequenceinformation, and may include sequences designed to have desiredparameters.

A wide variety of cell types can be used as a recipient of evolvedgenes. Cells of particular interest include many bacterial cell types,both gram-negative and gram-positive, such as Rhodococcus,Streptomycetes, Actinomycetes, Corynebacteria, Penicillium, Bacillus,Escherichia coli, Pseudomonas, Salmonella, and Erwinia. Cells ofinterest also include eukaryotic cells, particularly mammalian cells(e.g., mouse, hamster, primate, human), both cell lines and primarycultures. Such cells include stem cells, including embryonic stem cells,zygotes, fibroblasts, lymphocytes, Chinese hamster ovary (CHO), mousefibroblasts (NIH3T3), kidney, liver, muscle, and skin cells. Othereukaryotic cells of interest include plant cells, such as maize, rice,wheat, cotton, soybean, sugarcane, tobacco, and arabidopsis; fish,algae, fungi (Penicillium, Fusarium, Aspergillus, Podospora,Neurospora), insects, yeasts (Pischia and Saccharomyces).

The choice of host will depend on a number of factors, depending on theintended use of the engineered host, including pathogenicity, substraterange, environmental hardiness, presence of key intermediates, ease ofgenetic manipulation, and likelihood of promiscuous transfer of geneticinformation to other organisms. A preferred host has the ability toreplicate vector DNA, express proteins of interest, and properly trafficproteins of interest. Particularly advantageous hosts are E. coli,lactobacilli, Streptomycetes, Actinomycetes, fungi such as Saccaromycescerivisiae or Pischia pastoris, Schneider cells, L-cells, COS cells, CHOcells, and transformed B cell lines such as SP2/0, J558, NS-1 andAG8-653.

The breeding procedure starts with at least two substrates thatgenerally show substantial sequence identity to each other (i.e., atleast about 50%, 70%, 80% or 90% sequence identity), but differ fromeach other at certain positions. The difference can be any type ofmutation, for example, substitutions, insertions and deletions. Often,different segments differ from each other in perhaps 5-20 positions. Forrecombination to generate increased diversity relative to the startingmaterials, the starting materials must differ from each other in atleast two nucleotide positions. That is, if there are only twosubstrates, there should be at least two divergent positions. If thereare three substrates, for example, one substrate can differ from thesecond as a single position, and the second can differ from the third ata different single position. The starting DNA segments can be naturalvariants of each other, for example, allelic or species variants. Thesegments can also be from nonallelic genes showing some degree ofstructural and usually functional relatedness (e.g., different geneswithin a superfamily such as the immunoglobulin superfamily). Thestarting DNA segments can also be induced variants of each other. Forexample, one DNA segment can be produced by error-prone PCR replicationof the other, or by substitution of a mutagenic cassette. Inducedmutants can also be prepared by propagating one for both) of thesegments in a mutagenic strain. In these situations, strictly speaking,the second DNA segment is not a single segment but a large family ofrelated segments. The different segments forming the starting materialsare often the same length or substantially the same length. However,this need not be the case; for example; one segment can be a subsequenceof another. The segments can be present as part of larger molecules,such as vectors, or can be in isolated form.

The starting DNA segments are recombined by any of the recursivesequence recombination formats provided herein to generate a diverselibrary of recombinant DNA segments. Such a library can vary widely insize from having fewer than 10 to more than 10⁵, 10⁹, or 10¹² members.In general, the starting segments and the recombinant librariesgenerated include full-length coding sequences and any essentialregulatory sequences, such as a promoter and polyadenylation sequence,required for expression. However, if this is not the case, therecombinant DNA segments in the library can be inserted into a commonvector providing the missing sequences before performingscreening/selection.

If the recursive sequence recombination format employed is an in vivoformat, the library of recombinant DNA segments generated already existsin a cell, which is usually the cell type in which expression of theenzyme with altered substrate specificity is desired. If recursivesequence recombination is performed in vitro, the recombinant library ispreferably introduced into the desired cell type beforescreening/selection. The members of the recombinant library can belinked to an episome or virus before introduction or can be introduceddirectly. In some embodiments of the invention, the library is amplifiedin a first host, and is then recovered from that host and introduced toa second host more amenable to expression, selection, or screening, orany other desirable parameter. The manner in which the library isintroduced into the cell type depends on the DNA-uptake characteristicsof the cell type, e.g., having viral receptors, being capable ofconjugation, or being naturally competent. If the cell type isinsusceptible to natural and chemical-induced competence, butsusceptible to electroporation, one would usually employelectroporation. If the cell type is insusceptible to electroporation aswell, one can employ biolistics. The biolistic PDS-1000 Gene Gun(Biorad, Hercules, Calif.) uses helium pressure to accelerate DNA-coatedgold or tungsten microcarriers toward target cells. The process isapplicable to a wide range of tissues, including plants, bacteria,fungi, algae, intact animal tissues, tissue culture cells, and animalembryos. One can employ electronic pulse delivery, which is essentiallya mild electroporation format for live tissues in animals and patients.Zhao, Advanced Drug Delivery Reviews 17:257-262 (1995). Novel methodsfor making cells competent are described in co-pending application U.S.patent application Ser. No. 08/621,430, filed Mar. 25, 1996. Afterintroduction of the library of recombinant DNA genes, the cells areoptionally propagated to allow expression of genes to occur.

A. In Vitro Formats

One format for recursive sequence recombination utilizes a pool ofrelated sequences. The sequences can be DNA or RNA and can be of variouslengths depending on the size of the gene or DNA fragment to berecombined or reassembled. Preferably the sequences are from 50 bp to100 kb.

The pool of related substrates can be fragmented, usually at random,into fragments of from about 5 bp to 5 kb or more. Preferably the sizeof the random fragments is from about 10 bp to 1000 bp, more preferablythe size of the DNA fragments is from about 20 bp to 500 bp. Thesubstrates can be digested by a number of different methods, such asDNAseI or RNAse digestion, random shearing or restriction enzymedigestion. The concentration of nucleic acid fragments of a particularlength is often less than 0.1% or 1% by weight of the total nucleicacid. The number of different specific nucleic acid fragments in themixture is usually at least about 100, 500 or 1000.

The mixed population of nucleic acid fragments are denatured by heatingto about 80° C. to 100° C., more preferably from 90° C. to 96° C., toform single-stranded nucleic acid fragments. Single-stranded nucleicacid fragments having regions of sequence identity with othersingle-stranded nucleic acid fragments can then be reannealed by coolingto 20° C. to 75° C., and preferably from 40° C. to 65° C. Renaturationcan be accelerated by the addition of polyethylene glycol (“PEG”) orsalt. The salt concentration is preferably from 0 mM to 600 mM, morepreferably the salt concentration is from 10 mM to 100 mM. The salt maybe such salts as (NH₄)₂SO₄, KCl, or NaCl. The concentration of PEG ispreferably from 0% to 20%, more preferably from 5% to 10%. The fragmentsthat reanneal can be from different substrates.

The annealed nucleic acid fragments are incubated in the presence of anucleic acid polymerase, such as Taq or Klenow, Mg⁺⁺ at 1 mM-20 mM, anddNTP's (i.e. dATP, dCTP, dGTP and dTTP). If regions of sequence identityare large, Taq or other high-temperature polymerase can be used with anannealing temperature of between 45-65° C. If the areas of identity aresmall, Klenow or other low-temperature polymerases can be used with anannealing temperature of between 20-30° C. The polymerase can be addedto the random nucleic acid fragments prior to annealing, simultaneouslywith annealing or after annealing.

The cycle of denaturation, renaturation and incubation of random nucleicacid fragments in the presence of polymerase is sometimes referred to as“shuffling” of the nucleic acid in vitro. This cycle is repeated for adesired number of times. Preferably the cycle is repeated from 2 to 100times, more preferably the sequence is repeated from 10 to 40 times. Theresulting nucleic acids are a family of double-stranded polynucleotidesof from about 50 bp to about 100 kb, preferably from 500 bp to 50 kb.The population represents variants of the starting substrates showingsubstantial sequence identity thereto but also diverging at severalpositions. The population has many more members than the startingsubstrates. The population of fragments resulting from recombination ispreferably first amplified by PCR, then cloned into an appropriatevector and the ligation mixture used to transform host cells.

In a variation of in vitro shuffling, subsequences of recombinationsubstrates can be generated by amplifying the full-length sequencesunder conditions which produce a substantial fraction, typically atleast 20 percent or more, of incompletely extended amplificationproducts. The amplification products, including the incompletelyextended amplification products are denatured and subjected to at leastone additional cycle of reannealing and amplification. This variation,wherein at least one cycle of reannealing and amplification provides asubstantial fraction of incompletely extended products, is termed“stuttering.” In the subsequent amplification round, the incompletelyextended products anneal to and prime extension on differentsequence-related template species.

In a further variation, at least one cycle of amplification can beconducted using a collection of overlapping single-stranded DNAfragments of related sequence, and different lengths. Each fragment canhybridize to and prime polynucleotide chain extension of a secondfragment from the collection, thus forming sequence-recombinedpolynucleotides. In a further variation, single-stranded DNA fragmentsof variable length can be generated from a single primer by Vent DNApolymerase on a first DNA template. The single stranded DNA fragmentsare used as primers for a second, Kunkel-type template, consisting of auracil-containing circular single-stranded DNA. This results in multiplesubstitutions of the first template into the second (see Levichkin etal., Mol. Biology 29:572-577 (1995)).

Nucleic acid sequences can be recombined by recursive sequencerecombination even if they lack sequence homology. Homology can beintroduced using synthetic oligonucleotides as PCR primers. In additionto the specific sequences for the nucleic acid segment being amplified,all of the primers used to amplify one particular segment aresynthesized to contain an additional sequence of 20-40 bases 5′ to thegene (sequence A) and a different 20-40 base sequence 3′ to the segment(sequence B). An adjacent segment is amplified using a 5′ primer whichcontains the complementary strand of sequence B (sequence B′), and a 3′primer containing a different 20-40 base sequence (C). Similarly,primers for the next adjacent segment contain sequences C′(complementary to C) and D. In this way, small regions of homology areintroduced, making the segments into site-specific recombinationcassettes. Subsequent to the initial amplification of individualsegments, the amplified segments can then be mixed and subjected toprimerless PCR.

When domains within a polypeptide are shuffled, it may not be possibleto introduce additional flanking sequences to the domains, due to theconstraint of maintaining a continuous open reading frame. Instead,groups of oligonucleotides are synthesized that are homologous to the 3′end of the first domain encoded by one of the genes to be shuffled, andthe 5′ ends of the second domains encoded by all of the other genes tobe shuffled together. This is repeated with all domains, thus providingsequences that allow recombination between protein domains whilemaintaining their order.

B. In Vivo Formats

1. Plasmid-Plasmid Recombination

The initial substrates for recombination are a collection ofpolynucleotides comprising variant forms of a gene. The variant formsusually show substantial sequence identity to each other sufficient toallow homologous recombination between substrates. The diversity betweenthe polynucleotides can be natural (e.g., allelic or species variants),induced (e.g., error-prone PCR or error-prone recursive sequencerecombination), or the result of in vitro recombination. Diversity canalso result from resynthesizing genes encoding natural proteins withalternative codon usage. There should be at least sufficient diversitybetween substrates that recombination can generate more diverse productsthan there are starting materials. There must be at least two substratesdiffering in at least two positions. However, commonly a library ofsubstrates of 10³-10⁸ members is employed. The degree of diversitydepends on the length of the substrate being recombined and the extentof the functional change to be evolved. Diversity at between 0.1-25% ofpositions is typical. The diverse substrates are incorporated intoplasmids. The plasmids are often standard cloning vectors, e.g.,bacterial multicopy plasmids. However, in some methods to be describedbelow, the plasmids include mobilization (MOB) functions. The substratescan be incorporated into the same or different plasmids. Often at leasttwo different types of plasmid having different types of selectablemarkers are used to allow selection for cells containing at least twotypes of vector. Also, where different types of plasmid are employed,the different plasmids can come from two distinct incompatibility groupsto allow stable co-existence of two different plasmids within the cell.Nevertheless, plasmids from the same incompatibility group can stillco-exist within the same cell for sufficient time to allow homologousrecombination to occur.

Plasmids containing diverse substrates are initially introduced intocells by any method (e.g., chemical transformation, natural competence,electroporation, biolistics, packaging into phage or viral systems).Often, the plasmids are present at or near saturating concentration(with respect to maximum transfection capacity) to increase theprobability of more than one plasmid entering the same cell. Theplasmids containing the various substrates can be transfectedsimultaneously or in multiple rounds. For example, in the latterapproach cells can be transfected with a first aliquot of plasmid,transfectants selected and propagated, and then infected with a secondaliquot of plasmid.

Having introduced the plasmids into cells, recombination betweensubstrates to generate recombinant genes occurs within cells containingmultiple different plasmids merely by propagating the cells. However,cells that receive only one plasmid are unable to participate inrecombination and the potential contribution of substrates on suchplasmids to evolution is not fully exploited (although these plasmidsmay contribute to some extent if they are progagated in mutator cells).The rate of evolution can be increased by allowing all substrates toparticipate in recombination. Such can be achieved by subjectingtransfected cells to electroporation. The conditions for electroporationare the same as those conventionally used for introducing exogenous DNAinto cells (e.g., 1,000-2,500 volts, 400 μF and a 1-2 mM gap). Underthese conditions, plasmids are exchanged between cells allowing allsubstrates to participate in recombination. In addition the products ofrecombination can undergo further rounds of recombination with eachother or with the original substrate. The rate of evolution can also baincreased by use of conjugative transfer. To exploit conjugativetransfer, substrates can be cloned into plasmids having MOB genes, andtra genes are also provided in cis or in trans to the MOB genes. Theeffect of conjugative transfer is very similar to electroporation inthat it allows plasmids to move between cells and allows recombinationbetween any substrate and the products of previous recombination tooccur, merely by propagating the culture. The rate of evolution can alsobe increased by fusing cells to induce exchange of plasmids orchromosomes. Fusion can be induced by chemical agents, such as PEG, orviral proteins, such as influenza virus hemagglutinin, HSV-1 gB and gD.The rate of evolution can also be increased by use of mutator host cells(e.g., Mut L, S, D, T, H in bacteria and Ataxia telangiectasia humancell lines).

The time for which cells are propagated and recombination is allowed tooccur, of course, varies with the cell type but is generally notcritical, because even a small degree of recombination can substantiallyincrease diversity relative to the starting materials. Cells bearingplasmids containing recombined genes are subject to screening orselection for a desired function. For example, if the substrate beingevolved contains a drug resistance gene, one would select for drugresistance. Cells surviving screening or selection can be subjected toone or more rounds of screening/selection followed by recombination orcan be subjected directly to an additional round of recombination.“Screening” as used herein is intended to include “selection” as a typeof screen.

The next round of recombination can be achieved by several differentformats independently of the previous round. For example, a furtherround of recombination can be effected simply by resuming theelectroporation or conjugation-mediated intercellular transfer ofplasmids described above. Alternatively, a fresh substrate orsubstrates, the same or different from previous substrates, can betransfected into cells surviving selection/screening. Optionally, thenew substrates are included in plasmid vectors bearing a differentselective marker and/or from a different incompatibility group than theoriginal plasmids. As a further alternative, cells survivingselection/screening can be subdivided into two subpopulations, andplasmid DNA from one subpopulation transfected into the other, where thesubstrates from the plasmids from the two subpopulations undergo afurther round of recombination. In either of the latter two options, therate of evolution can be increased by employing DNA extraction,electroporation, conjugation or mutator cells, as described above. In astill further variation, DNA from cells surviving screening/selectioncan be extracted and subjected to in vitro recursive sequencerecombination.

After the second round of recombination, a second round ofscreening/selection is performed, preferably under conditions ofincreased stringency. If desired, further rounds of recombination andselection/screening can be performed using the same strategy as for thesecond round. With successive rounds of recombination andselection/screening, the surviving recombined substrates evolve towardacquisition of a desired phenotype. Typically, in this and other methodsof recursive recombination, the final product of recombination that hasacquired the desired phenotype differs from starting substrates at0.1%-25% of positions and has evolved at a rate orders of magnitude inexcess (e.g., by at least 10-fold, 100-fold, 1000-fold, or 10,000 fold)of the rate of evolution driven by naturally acquired mutation of about1 mutation per 10⁻⁹ positions per generation (see Anderson et al., Proc.Natl. Acad. Sci. U.S.A. 93:906-907 (1996)). The “final product” may betransferred to another host more desirable for utilization of the“shuffled” DNA. This is particularly advantageous in situations wherethe more desirable host is less efficient as a host for the many cyclesof mutation/recombination due to the lack of molecular biology orgenetic tools available for other organisms such as E. coli.

2. Virus-Plasmid Recombination

The strategy used for plasmid-plasmid recombination can also be used forvirus-plasmid recombination; usually, phage-plasmid recombination.However, some additional comments particular to the use of viruses areappropriate. The initial substrates for recombination are cloned intoboth plasmid and viral vectors. It is usually not critical whichsubstrate(s) is/are inserted into the viral vector and which into theplasmid, although usually the viral vector should contain differentsubstrate(s) from the plasmid. As before, the plasmid (and the virus)typically contains a selective marker. The plasmid and viral vectors canboth be introduced into cells by transfection as described above.However, a more efficient procedure is to transfect the cells withplasmid, select transfectants and infect the transfectants with virus.Because the efficiency of infection of many viruses approaches 100% ofcells, most cells transfected and infected by this route contain both aplasmid and virus bearing different substrates.

Homologous recombination occurs between plasmid and virus generatingboth recombined plasmids and recombined virus. For some viruses, such asfilamentous phage, in which intracellular DNA exists in bothdouble-stranded and single-stranded forms, both can participate inrecombination. Provided that the virus is not one that rapidly killscells, recombination can be augmented by use of electroporation orconjugation to transfer plasmids between cells. Recombination can alsobe augmented for some types of virus by allowing the progeny virus fromone cell to reinfect other cells. For some types of virus, virusinfected-cells show resistance to superinfection. However, suchresistance can be overcome by infecting at high multiplicity and/orusing mutant strains of the virus in which resistance to superinfectionis reduced.

The result of infecting plasmid-containing cells with virus depends onthe nature of the virus. Some viruses, such as filamentous phage, stablyexist with a plasmid in the cell and also extrude progeny phage from thecell. Other viruses, such as lambda having a cosmid genome, stably existin a cell like plasmids without producing progeny virions. Otherviruses, such as the T-phage and lytic lambda, undergo recombinationwith the plasmid but ultimately kill the host cell and destroy plasmidDNA. For viruses that infect cells without killing the host, cellscontaining recombinant plasmids and virus can be screened/selected usingthe same approach as for plasmid-plasmid recombination. Progeny virusextruded by cells surviving selection/screening can also be collectedand used as substrates in subsequent rounds of recombination. Forviruses that kill their host cells, recombinant genes resulting fromrecombination reside only in the progeny virus. If the screening orselective assay requires expression of recombinant genes in a cell, therecombinant genes should be transferred from the progeny virus toanother vector, e.g., a plasmid vector, and retransfected into cellsbefore selection/screening is performed.

For filamentous phage, the products of recombination are present in bothcells surviving recombination and in phage extruded from these cells.The dual source of recombinant products provides some additional optionsrelative to the plasmid-plasmid recombination. For example, DNA can beisolated from phage particles for use in a round of in vitrorecombination. Alternatively, the progeny phage can be used to transfector infect cells surviving a previous round of screening/selection, orfresh cells transfected with fresh substrates for recombination.

3. Virus-Virus Recombination

The principles described for plasmid-plasmid and plasmid-viralrecombination can be applied to virus-virus recombination with a fewmodifications. The initial substrates for recombination are cloned intoa viral vector. Usually, the same vector is used for all substrates.Preferably, the virus is one that, naturally or as a result of mutation,does not kill cells. After insertion, some viral genomes can be packagedin vitro or using a packaging cell, line. The packaged viruses are usedto infect cells at high multiplicity such that there is a highprobability that a cell will receive multiple viruses bearing differentsubstrates.

After the initial round of infection, subsequent steps depend on thenature of infection as discussed in the previous section. For example,if the viruses have phagemid (Sambrook et al., Molecular Cloning, CSHPress, 1987) genomes such as lambda cosmids or M13, F1 or Fd phagemids,the phagemids behave as plasmids within the cell and undergorecombination simply by propagating the cells. Recombination isparticularly efficient between single-stranded forms of intracellularDNA. Recombination can be augmented by electroporation of cells.

Following selection/screening, cosmids containing recombinant genes canbe recovered from surviving cells, e.g., by heat induction of a cos⁻lysogenic host cell, or extraction of DNA by standard procedures,followed by repackaging cosmid DNA in vitro.

If the viruses are filamentous phage, recombination of replicating formDNA occurs by propagating the culture of infected cells.Selection/screening identifies colonies of cells containing viralvectors having recombinant genes with improved properties, together withinfectious particles (i.e., phage or packaged phagemids) extruded fromsuch cells. Subsequent options are essentially the same as forplasmid-viral recombination.

4. Chromosome Recombination

This format can be used to especially evolve chromosomal substrates. Theformat is particularly preferred in situations in which many chromosomalgenes contribute to a phenotype or one does not know the exact locationof the chromosomal gene(s) to be evolved. The initial substrates forrecombination are cloned into a plasmid vector. If the chromosomalgene(s) to be evolved are known, the substrates constitute a family ofsequences showing a high degree of sequence identity but some divergencefrom the chromosomal gene. If the chromosomal genes to be evolved havenot been located, the initial substrates usually constitute a library ofDNA segments of which only a small number show sequence identity to thegene or gene(s) to be evolved. Divergence between plasmid-bornesubstrate and the chromosomal genets) can be induced by mutagenesis orby obtaining the plasmid-borne substrates from a different species thanthat of the cells bearing the chromosome.

The plasmids bearing substrates for recombination are transfected intocells having chromosomal gene(s) to be evolved. Evolution can occursimply by propagating the culture, and can be accelerated bytransferring plasmids between cells by conjugation or electroporation.Evolution can be further accelerated by use of mutator host cells or byseeding a culture of nonmutator host cells being evolved with mutatorhost cells and inducing intercellular transfer of plasmids byelectroporation or conjugation. Preferably, mutator host cells used forseeding contain a negative selectable marker to facilitate isolation ofa pure culture of the nonmutator cells being evolved.Selection/screening identifies cells bearing chromosomes and/or plasmidsthat have evolved toward acquisition of a desired function.

Subsequent rounds of recombination and selection/screening proceed insimilar fashion to those described for plasmid-plasmid recombination.For example, further recombination can be effected by propagating cellssurviving recombination in combination with electroporation orconjugative transfer of plasmids. Alternatively, plasmids bearingadditional substrates for recombination can be introduced into thesurviving cells. Preferably, such plasmids are from a differentincompatibility group and bear a different selective marker than theoriginal plasmids to allow selection for cells containing at least twodifferent plasmids. As a further alternative, plasmid and/or chromosomalDNA can be isolated from a subpopulation of surviving cells andtransfected into a second subpopulation. Chromosomal DNA can be clonedinto a plasmid vector before transfection.

5. Virus-Chromosome Recombination

As in the other methods described above, the virus is usually one thatdoes not kill the cells, and is often a phage or phagemid. The procedureis substantially the same as for plasmid-chromosome recombination.Substrates for recombination are cloned into the vector. Vectorsincluding the substrates can then be transfected into cells or in vitropackaged and introduced into cells by infection. Viral genomes recombinewith host chromosomes merely by propagating a culture. Evolution can beaccelerated by allowing intercellular transfer of viral genomes byelectroporation, or reinfection of cells by progeny virions.Screening/selection identifies cells having chromosomes and/or viralgenomes that have evolved toward acquisition of a desired function.

There are several options for subsequent rounds of recombination. Forexample, viral genomes can be transferred between cells survivingselection/recombination by electroporation. Alternatively, virusesextruded from cells surviving selection/screening can be pooled and usedto superinfect the cells at high multiplicity. Alternatively, freshsubstrates for recombination can be introduced into the cells, either onplasmid or viral vectors.

II. Application of Recursive Sequence Recombination to Evolution ofPolypeptides

In addition to the techniques described above, some additionallyadvantageous modifications of these techniques for the evolution ofpolypeptides are described below. These methods are referred to as “finegrain” and “coarse grain” shuffling. The coarse grain methods allow oneto exchange chunks of genetic material between substrate nucleic acids,thereby limiting diversity in the resulting recombinants to. exchangesor substitutions of domains, restriction fragments, oligo-encoded blocksof mutations, or other arbitrarily defined segments, rather thanintroducing diversity more randomly across the substrate. In contrast tocoarse grain shuffling, fine grain shuffling methods allow thegeneration of all possible recombinations, or permutations, of a givenset of very closely linked mutations, including multiple permutations,within a single segment, such as a codon.

In some embodiments, coarse grain or fine grain shuffling techniques arenot performed as exhaustive searches of all possible mutations within anucleic acid sequence. Rather, these techniques are utilized to providea sampling of variation possible within a gene based on known sequenceor structural information. The size of the sample is typicallydetermined by the nature of the screen or selection process. Forexample, when a screen is performed in a 96-well microtiter format, itmay be preferable to limit the size of the recombinant library to about100 such microtiter plates for convenience in screening.

A. Use of Restriction Enzyme Sites to Recombine Mutations

In some situations it is advantageous to use restriction enzyme sites innucleic acids to direct the recombination of mutations in a nucleic acidsequence of interest. These techniques are particularly preferred in theevolution of fragments that cannot readily be shuffled by existingmethods due to the presence of repeated DNA or other problematic primarysequence motifs. They are also preferred for shuffling large fragments(typically greater than 10 kb), such as gene clusters that cannot bereadily shuffled and “PCR-amplified” because of their size. Althoughfragments up to 50 kb have been reported to be amplified by PCR (Barnes,Proc. Natl. Acad. Sci. (U.S.A.) 91:2216-2220 (1994)), it can beproblematic for fragments over 10 kb, and thus alternative methods forshuffling in the range of 10-50 kb and beyond are preferred. Preferably,the restriction endonucleases used are of the Class II type (Sambrook etal., Molecular Cloning, CSH Press, 1987) and of these, preferably thosewhich generate nonpalindromic sticky end overhangs such as Alwn I, Sfi Ior BstXl. These enzymes generate nonpalindromic ends that allow forefficient ordered reassembly with DNA ligase. Typically, restrictionenzyme (or endonuclease) sites are identified by conventionalrestriction enzyme mapping techniques (Sambrook et al., MolecularCloning, CSH Press, 1987), by analysis of sequence information for thatgene, or by introduction of desired restriction sites into a nucleicacid sequence by synthesis (i.e. by incorporation of silent mutations).

The DNA substrate molecules to be digested can either be from in vivoreplicated DNA, such as a plasmid preparation, or from PCR amplifiednucleic acid fragments harboring the restriction enzyme recognitionsites of interest, preferably near the ends of the fragment. Typically,at least two variants of a gene of interest, each having one or moremutations, are digested with at least one restriction enzyme determinedto cut within the nucleic acid sequence of interest. The restrictionfragments are then joined with DNA ligase to generate full length geneshaving shuffled regions. The number of regions shuffled will depend onthe number of cuts within the nucleic acid sequence of interest. Theshuffled molecules can be introduced into cells as described above andscreened or selected for a desired property. Nucleic acid can then beisolated from pools (libraries) or clones having desired properties andsubjected to the same procedure until a desired degree of improvement isobtained.

In some embodiments, at least one DNA substrate molecule or fragmentthereof is isolated and subjected to mutagenesis. In some embodiments,the pool or library of religated restriction fragments are subjected tomutagenesis before the digestion-ligation process is repeated.“Mutagenesis” as used herein comprises such techniques known in the artas PCR mutagenesis, oligonucleotide-directed mutagenesis, site-directedmutagenesis, etc., and recursive sequence recombination by any of thetechniques described herein.

An example of the use of this format is in the manipulation ofpolyketide clusters. Polyketide clusters (Khosla et al., TIBTECH 14,September 1996) are typically 10 to 100 kb in length, specifyingmultiple large polypeptides which assemble into very large multienzymecomplexes. Due to the modular nature of these complexes and the modularnature of the biosynthetic pathway, nucleic acids encoding proteinmodules can be exchanged between different polyketide clusters togenerate novel and functional chimeric polyketides. The introduction ofrare restriction endonuclease sites such as SfiI (eight baserecognition, nonpalindromic overhangs) at nonessential sites betweenpolypeptides or in introns engineered within polypeptides would provide“handles” with which to manipulate exchange of nucleic acid segmentsusing the technique described above.

B. Reassembly PCR

A further technique for recursively recombining mutations in a nucleicacid sequence utilizes “reassembly PCR”. This method can be used toassemble multiple segments that have been separately evolved into a fulllength nucleic acid template such as a gene. This technique is performedwhen a pool of advantageous mutants is known from previous work or hasbeen identified by screening mutants that may have been created by anymutagenesis technique known in the art, such as PCR mutagenesis,cassette mutagenesis, doped oligo mutagenesis, chemical mutagenesis, orpropagation of the DNA template in vivo in mutator strains. Boundariesdefining segments of a nucleic acid sequence of interest preferably liein intergenic regions, introns, or areas of a gene not likely to havemutations of interest. Preferably, oligonucleotide primers (oligos) aresynthesized for PCR amplification of segments of the nucleic acidsequence of interest, such that the sequences of the oligonucleotidesoverlap the junctions of two segments. The overlap region is typicallyabout 10 to 100 nucleotides in length. Each of the segments is amplifiedwith a set of such primers. The PCR products are then “reassembled”according to assembly protocols such as those used in Sections IA-Babove to assemble randomly fragmented genes. In brief, in an assemblyprotocol the PCR products are first purified away from the primers, by,for example, gel electrophoresis or size exclusion chromatography.Purified products are mixed together and subjected to about 1-10 cyclesof denaturing, reannealing, and extension in the presence of polymeraseand deoxynucleoside triphosphates (dNTP's) and appropriate buffer saltsin the absence of additional primers (“self-priming”). Subsequent PCRwith primers flanking the gene are used to amplify the yield of thefully reassembled and shuffled genes. This method is necessarily “coarsegrain” and hence only recombines mutations in a blockwise fashion, anadvantage for some searches such as when recombining allelic variants ofmultiple genes within an operon.

In some embodiments, the resulting reassembled genes are subjected tomutagenesis before the process is repeated.

In some embodiments, oligonucleotides that incorporate uracil into theprimers are used for PCR amplification. Typically uracil is incorporatedat one site in the oligonucleotide. The products are treated with uracilglycosylase, thereby generating a single-stranded overhang, and arereassembled in an ordered fashion by a method such as disclosed byRashtchian (Current Biology, 6:30-36 (1995)).

In a further embodiment, the PCR primers for amplification of segmentsof the nucleic acid sequence of interest are used to introduce variationinto the gene of interest as follows. Mutations at sites of interest ina nucleic acid sequence are identified by screening or selection, bysequencing homologues of the nucleic acid sequence, and so on.Oligonucleotide PCR primers are then synthesized which encode wild typeor mutant information at sites of interest. These primers are then usedin PCR mutagenesis to generate libraries of full length genes encodingpermutations of wild type and mutant information at the designatedpositions. This technique is typically advantagous in cases where thescreening or selection process is expensive, cumbersome, or impracticalrelative to the cost of sequencing the genes of mutants of interest andsynthesizing mutagenic oligonucleotides.

An example of this method is the evolution of an improved Taqpolymerase, as described in detail below. Mutant proteins resulting fromapplication of the method are identified and assayed in a sequencingreaction to identify mutants with improved sequencing properties. Thisis typically done in a high throughput format (see, for example, Broachet al. Nature 384 (Supp): 14-16 (1996)) to yield, after screening, asmall number, e.g., about 2 to 100, of candidate recombinants forfurther evaluation. The mutant genes can then be sequenced to provideinformation regarding the location of the mutation. The correspondingmutagenic oligonucleotide primers can be synthesized from thisinformation, and used in a reassembly reaction as described above toefficiently generate a library with an average of many mutations pergene. Thus, multiple rounds of this protocol allows the efficient searchfor improved variants of the Taq polymerase.

C. Enrichment for Mutant Sequence Information

In some embodiments of the invention, recombination reactions, such asthose discussed above, are enriched for mutant sequences so that themultiple mutant spectrum, i.e. possible combinations of mutations, ismore efficiently sampled. The rationale for this is as follows. Assumethat a number, n, of mutant clones with improved activity is obtained,wherein each clone has a single point mutation at a different positionin the nucleic acid sequence. If this population of mutant clones withan average of one mutation of interest per nucleic acid sequence is thenput into a recombination reaction, the resulting population will stillhave an average of one mutation of interest per nucleic acid sequence asdefined by a Poisson distribution, leaving the multiple mutationspectrum relatively unpopulated.

The amount of screening required to identify recombinants having two ormore mutations can be dramatically reduced by the following technique.The nucleic acid sequences of interest are obtained from a pool ofmutant clones and prepared as fragments, typically by digestion with arestriction endonuclease, sonication, or by PCR amplification. Thefragments are denatured, then allowed to reanneal, thereby generatingmismatched hybrids where one strand of a mutant has hybridized with acomplementary strand from a different mutant or wild-type clone. Thereannealed products are then fragmented into fragments of about 20-100bp, for example, by the use of DNAseI. This fragmentation reaction hasthe effect of segregating regions of the template containing mismatches(mutant information) from those encoding wild type sequence. Themismatched hybrids can then be affinity purified using aptamers, dyes,or other agents which bind to mismatched DNA. A preferred embodiment isthe use of mutS protein affinity matrix (Wagner et al., Nucleic AcidsRes. 23(19):3944-3948 (1995); Su et al., Proc. Natl. Acad. Sci.(U.S.A.), 83:5057-5061(1986)) with a preferred step of amplifying theaffinity-purified material in vitro prior to an assembly reaction. Thisamplified material is then put into a assembly PCR reaction as describedabove. Optionally, this material can be titrated against the originalmutant pool (e.g., from about 100% to 10% of the mutS enriched pool) tocontrol the average number of mutations per clone in the next round ofrecombination.

Another application of this method is in the assembly of gene constructsthat are enriched for polymorphic bases occurring as natural or selectedallelic variants or as differences between homologous genes of relatedspecies. For example, one may have several varieties of a plant that arebelieved to have heritable variation in a trait of interest (e.g.,drought resistance). It then is of interest to construct a library ofthese variant genes containing many mutations per gene. MutSselection'can be applied in combination with the assembly techniquesdescribed herein to generate such a pool of recombinants that are highlyenriched for polymorphic (“mutant”) information. In some embodiments,the pool of recombinant genes is provided in a transgenic host.Recombinants can be further evolved by PCR amplification of thetransgene from transgenic organisms that are determined to have animproved phenotype and applying the formats described in this inventionto further evolve them.

D. Intron-Driven Recombination

In some instances, the substrate molecules for recombination haveuniformly low homology, sporadically distributed regions of homology, orthe region of homology is relatively small (for example, about 10-100bp), such as phage displayed peptide ligands. These factors can reducethe efficiency and randomness of recombination in RSR. In someembodiments of the invention, this problem is addressed by theintroduction of introns between coding exons in sequences encodingprotein homologues. In further embodiments of the invention, introns canbe used (Chong et al., J. Biol. Chem., 271:22159-22168 (1996)).

In this method, a nucleic acid sequence, such as a gene or gene family,is arbitrarily defined to have segments. The segments are preferablyexons. Introns are engineered between the segments. Preferably, theintron inserted between the first and second segments is at least about10% divergent from the intron inserted between second and thirdsegments, the intron inserted between second and third segments is atleast about 10% divergent from the introns inserted between any of theprevious segment pairs, and so on through segments n and n+1. Theintrons between any given set of exons will thus initially be identicalbetween all clones in the library, whereas the exons can be arbitrarilydivergent in sequence. The introns therefore provide homologous DNAsequences that will permit application of any of the described methodsfor RSR while the exons can be arbitrarily small or divergent insequence, and can evolve to achieve an arbitrarily large degree ofsequence divergence without a significant loss in efficiency inrecombination. Restriction sites can also be engineered into theintronic nucleic acid sequence of interest so as to allow a directedreassemmbly of restriction fragments. The starting exon DNA may besynthesized de novo from sequence information, or may be present in anynucleic acid preparation (e.g., genomic, cDNA, libraries, and so on).For example, 1 to 10 nonhomologous introns can be designed to directrecombination of the nucleic acid sequences of interest by placing thembetween exons. The sequence of the introns can be all or partly obtainedfrom known intron sequence. Preferably, the introns are self-splicing.Ordered sets of introns and exon libraries are assembled into functionalgenes by standard methods (Sambrook et al., Molecular Cloning, CSH Press(1987)).

Any of the formats for in vitro or in vivo recombination describedherein can be applied for recursive exon shuffling. A preferred formatis to use nonpalindromic restriction sites such as Sfi I placed into theintronic sequences to promote shuffling. Pools of selected clones aredigested with Sfi I and religated. The nonpalindromic overhangs promoteordered reassembly of the shuffled exons. These libraries of genes canbe expressed and screened for desired properties, then subjected tofurther recursive rounds of recombination by this process. In someembodiments, the libraries are subjected to mutagenesis before theprocess is repeated.

An example of how the introduction of an intron into a mammalian libraryformat would be used advantageously is as follows. An intron containinga lox (Sauer et al., Proc. Natl. Acad. Sci. (U.S.A.), 85:5166-5170(1988)) site is arbitrarily introduced between amino acids 92 and 93 ineach alpha interferon parental substrate. A library of 10⁴ chimericinterferon genes is made for each of the two exons (residues 1-92 andresidues 93-167), cloned into a replicating plasmid vector, andintroduced into target cells. The number 10⁴ is arbitrarily chosen forconvenience in screening. An exemplary vector for expression inmammalian cells would contain an SV40 origin, with the host cellsexpressing SV40 large T antigen, so as to allow transient expression ofthe interferon constructs. The cells are challenged with a cytopathicvirus such as vesicular stomatitis virus (VSV) in an interferonprotection assay (e.g., Meister et al., J. Gen. Virol. 67:1633-1643,(1986)). Cells surviving due to expression of interferon are recovered,the two libraries of interferon genes are PCR amplified, and reclonedinto a vector that can be amplified in E. coli. The amplified plasmidsare then transfected at high multiplicity (e.g. 10 micrograms of plasmidper 10⁶ cells) into a cre expressing host that can support replicationof that vector. The presence of cre in the host cells promotes efficientrecombination at the lox site in the interferon intron, thus shufflingthe selected sets of exons. This population of cells is then used in asecond round of selection by viral challenge and the process is appliedrecursively. In this format, the cre recombinase is preferrablyexpressed transiently on a cotransfected molecule that cannot replicatein the host. Thus, after segregation of recombinants from the creexpressing plasmid, no further recombination will occur and selectioncan be performed on genetically stable exon permutations. The method canbe used with more than one intron, with recombination enhancingsequences other than cre/lox (e.g., int/xis, etc.), and with othervector systems such as but not limited to retroviruses, adenovirus oradeno-associated virus.

5. Synthetic Oligonucleotide Mediated Recombination

1. Oligo Bridge Across Sequence Space.

In some embodiments of the invention, a search of a region of sequencespace defined by a set of substrates, such as members of a gene family,having less than about 80%, more typically, less than about 50%homology, is desired. This region, which can be part or all of a gene ora gene is arbitrarily delineated into segments. The segment borders canbe chosen randomly, based on correspondence with natural exons, based onstructural considerations (loops, alpha helices, subdomains, wholedomains, hydrophobic core, surface, dynamic simulations), and based oncorrelations with genetic mapping data.

Typically, the segments are then amplified by PCR with a pool of“bridge” oligonucleotides at each junction. Thus, if the set of fivegenes is broken into three segments A, B and C, and if there are fiveversions of each segment (A1, A2, . . . C4, C5), twenty fiveoligonucleotides are made for each strand of the A-B junctions whereeach bridge oligo has 20 bases of homology to one of the A and one ofthe B segments. In some cases, the number of required oligonucleotidescan be reduced by choosing segment boundaries that are identical in someor all of the gene family members. Oligonucleotides are similarlysynthesized for the B-C junction. The family of A domains is amplifiedby PCR with an outside generic A primer and the pool of A-B junctionoligonucleotides; the B domains with the A-B plus the B-C bridgeoligonucleotides, and the C domains with the B-C bridge oligonucleotidesplus a generic outside primer. Full length genes are made then made byassembly PCR or by the dUTP/uracil glycosylase methods described above.Preferably, products from this step are subjected to mutagenesis beforethe process of selection and recombination is repeated, until a desiredlevel of improvement or the evolution of a desired property is obtained.This is typically determined using a screening or selection asappropriate for the protein and property of interest.

An illustration of this method is illustrated below for therecombination of eleven homologous human alpha interferon genes.

2. Site Directed Mutagenesis (SDM) with Oligonucleotides EncodingHomologue Mutations Followed by Shuffling

In some embodiments of the invention, sequence information from one ormore substrate sequences is added to a given “parental” sequence ofinterest, with subsequent recombination between rounds of screening orselection. Typically, this is done with site-directed mutagenesisperformed by techniques well known in the art (Sambrook et al.,Molecular Cloning, CSH Press (1987)) with one substrate as template andoligonucleotides encoding single or multiple mutations from othersubstrate sequences, e.g. homologous genes. After screening or selectionfor an improved phenotype of interest, the selected recombinant(s) canbe further evolved using RSR techniques described herein. Afterscreening or selection, site-directed mutagenesis can be done again withanother collection of oligonucleotides encoding homologue mutations, andthe above process repeated until the desired properties are obtained.

When the difference between two homologues is one or more single pointmutations in a codon, degenerate oligonucleotides can be used thatencode the sequences in both homologues. One oligo may include many suchdegenerate codons and still allow one to exhaustively search allpermutations over that block of sequence. An example of this is providedbelow for the evolution of alpha interferon genes.

When the homologue sequence space is very large, it can be advantageousto restrict the search to certain variants. Thus, for example, computermodelling tools (Lathrop et al., J. Mol. Biol., 255:641-665 (1996)) canbe used to model each homologue mutation onto the target protein anddiscard any mutations that are predicted to grossly disrupt structureand function.

F. Recombination Directed by Host Machinery

In some embodiments of the invention, DNA substrate molecules areintroduced into cells, wherein the cellular machinery directs theirrecombination. For example, a library of mutants is constructed andscreened or selected for mutants with improved phenotypes by any of thetechniques described herein. The DNA substrate molecules encoding thebest candidates are recovered by any of the techniques described herein,then fragmented and used to transfect a mammalian host and screened orselected for improved function. The DNA substrate molecules arerecovered from the mammalian host, such as by PCR, and the process isrepeated until a desired level of improvement is obtained. In someembodiments, the fragments are denatured and reannealed prior totransfection, coated with recombination stimulating proteins such asrecA, or co-transfected with a selectable marker such as Neo^(R) toallow the positive selection for cells receiving recombined versions ofthe gene of interest.

For example, this format is preferred for the in vivo affinitymaturation of an antibody by RSR. In brief, a library of mutantantibodies is generated, as described herein for the 48G7 affinitymaturation. This library is FACS purified with ligand to enrich forantibodies with the highest 0.1-10% affinity. The V regions genes arerecovered by PCR, fragmented, and cotransfected or electorporated with avector into which reassembled V region genes can recombine. DNAsubstrate molecules are recovered from the cotranfected cells, and theprocess is repeated until the desired level of improvement is obtained.Other embodiments include reassembling the V regions prior to theelectroporation so that an intact V region exon can recombine into anantibody expression cassette. Further embodiments include the use ofthis format for other eukaryotic genes or for the evolution of wholeviruses.

G. Phagemid-Based Assembly

In some embodiments of the invention, a gene of interest is cloned intoa vector that generates single stranded DNA, such as a phagemid. Theresulting DNA substrate is mutagenzied by RSR in any method known in theart, transfected into host cells, and subjected to a screen or selectionfor a desired property or improved phenotype. DNA from the selected orscreened phagemids is amplified, by, for example, PCR or plasmidpreparation. This DNA preparation contains the various mutant sequencesthat one wishes to permute. This DNA is fragmented and denatured, andannealed with single-stranded DNA (ssDNA) phagemid template (ssDNAencoding the wild-type gene and vector sequences). A preferredembodiment is the use of dut(−) ung(−) host strains such as CJ236(Sambrook et al., Molecular Cloning CSH Press (1987)) for thepreparation of ssDNA.

Gaps in annealed template are filled with DNA polymerase and ligated toform closed relaxed circles. Since multiple fragments can anneal to thephagemid, the newly synthesized strand now consists of shuffledsequences. These products are transformed into a mutS strain of E. coliwhich is dut+ ung+. Phagemid DNA is recovered from the transfected hostand subjected again to this protocol until the desired level ofimprovement is obtained. The gene encoding the protein of interest inthis library of recovered phagemid DNA can be mutagenzied by anytechnique, including RSR, before the process is repeated.

III. Improved Protein Expression

While recombinant DNA technology has proved to be a very general methodfor obtaining large, pure, and homogeneous quantities of almost allnucleic acid sequences of interest, similar generality has not yet beenachieved for the production of large amounts of pure, homogeneousprotein in recombinant form. A likely explanation is that proteinexpression, folding, localization and stability is intrinsically morecomplex and unpredictable than for DNA. The yield of expressed proteinis a complex function of transcription rates, translation rates,interactions with the ribosome, interaction of the nascent polypeptidewith chaperonins and other proteins in the cell, efficiency ofoligomerization, interaction with components of secretion and otherprotein trafficking pathways, protease sensitivity, and the intrinsicstability of the final folded state. Optimization of such complexprocesses is well suited for the application of RSR. The followingmethods detail strategies for application of RSR to the optimization ofprotein expression.

A. Evolution of Mutant Genes with Improved Expression Using RSR on CodonUsage Libraries

The negative effect of rare E. coli codons on expression of recombinantproteins in this host has been clearly demonstrated (Rosenberg, et al.,J. Bact. 175:716-722 (1993)). However, general rules for the choice ofcodon usage patterns to optimize expression of functional protein havebeen elusive. In some embodiments of the invention, protein expressionis optimized by changing codons used in the gene of interest, based onthe degeneracy of the genetic code. Typically, this is accomplished bysynthesizing the gene using degenerate oligonucleotides. In someembodiments the degenerate oligonucleotides have the general structureof about 20 nucleotides of identity to a DNA substrate molecule encodinga protein of interest, followed by a region of about 20 degeneratenucleotides which encode a region of the protein, followed by anotherregion of about 20 nucleotides of identity. In a preferred embodiment,the region of identity utilizes preferred codons for the host. In afurther embodiment, the oligonucleotides are identical to the DNAsubstrate at least one 5′ and one 3′ nucleotide, but have at least 85%sequence homology to the DNA substrate molecule, with the difference dueto the use of degenerate codons. In some embodiments, a set of suchdegenerate oligonucleotides is used in which each oligonucleotideoverlaps with another by the general formula n−10, wherein n is thelength of the oligonucleotide. Such oligonucleotides are typically about20-1000 nucleotides in length. The assembled genes are then cloned,expressed, and screened or selected for improved expression. Theassembled genes can be subjected to recursive recombination methods asdescribed above until the desired improvement is achieved.

For example, this technique can be used to evolve bovine intestinalalkaline phosphatase (BIAP) for active expression in E. coli. Thisenzyme is commonly used as a reporter gene in assay formats such asELISA. The cloned gene cannot be expressed in active form in aprokaryotic host such as E. coli in good yield. Development of such anexpression system would allow one to access inexpensive expressiontechnology for BIAP and, importantly, for engineered variants withimproved activity or chemical coupling properties (such as chemicalcoupling to antibodies). A detailed example is provide in theExperimental Examples section.

B. Improved Folding

In some embodiments of the invention, proteins of interest whenoverexpressed or expressed in heterologous hosts form inclusion bodies,with the majority of the expressed protein being found in insolubleaggregates. Recursive sequence recombination techniques can be used tooptimize folding of such target proteins. There are several ways toimprove folding, including mutating evolving the target protein ofinterest and evolving chaperonin proteins.

1. Evolving A Target Protein

a. Inclusion Body Fractionation Selection Using Lac Headpiece DimerFusion Protein

The lac repressor “headpiece dimer” is a small protein containing twoheadpiece domains connected by a short peptide linker which binds thelac operator with sufficient affinity that polypeptide fusions to thisheadpiece dimer will remain bound to the plasmid that encodes themthroughout an affinity purification process (Gates et al., J. Mol. Biol.255:373-386 (1995)). This property can be exploited, as follows, toevolve mutant proteins of interest with improved folding properties. Theprotein of interest can be mammalian, yeast, bacterial, etc.

A fusion protein between the lac headpiece dimer and a target proteinsequence is constructed, for example, as disclosed by Gates (supra).This construct, containing at least one lac operator, is mutagenized bytechnologies common in the arts such as PCR mutagenesis, chemicalmutagenesis, oligo directed mutagenesis (Sambrook et al., MolecularCloning CSH Press (1987)). The resulting library is transformed into ahost cell, and expression of the fusion protein is induced, preferablywith arabinose. An extract or lysate is generated from a culture of thelibrary expressing the construct. Insoluble protein is fractionated fromsoluble protein/DNA complexes by centrifugation or affinitychromatography, and the yield of soluble protein/DNA complexes isquantitated by quantitative PCR (Sambrook et al., Molecular Cloning, CSHPress, 1987) of the plasmid. Preferably, a reagent that is specific forproperly folded protein, such as a monoclonal antibody or a naturalligand, is, used to purify soluble protein/DNA complexes. The plasmidDNA from this step is isolated, subjected to RSR and again expressed.These steps are repeated until the yield of soluble protein/DNAcomplexes has reached a desired level of improvement. Individual clonesare then screened for retention of functional properties of the proteinof interest, such as enzymatic activity, etc.

This technique is generically useful for evolving solubility and otherproperties such as cellular trafficking of proteins heterologouslyexpressed in a host cell of interest. For example, one could select forefficient folding and nuclear localization of a protein fused to the lacrepressor headpiece dimer by encoding the protein on a plasmid encodingan SV40 origin of replication and a lac operator, and transientlyexpressing the fusion protein in a mammalian host expressing T antigen.Purification of protein/DNA complexes from nuclear HIRT extracts (Seedand Aruffo, Proc. Natl. Acad. Sci. (U.S.A.), 84:3365-3369 (1987)) wouldallow one to select for efficient folding and nuclear localizationproteins.

b. Functional Expression of Protein Using Phage Display

A problem often encountered in phage display methods such as thosedisclosed by O'Neil et al. (Current Biology, 5:443-449 (1995)) is theinability to functionally express a protein of interest on phage.Without being limited to anyone theory, improper folding of the proteinof interest can be responsible for this problem. RSR can be used toevolve a protein of interest for functional expression on phage.Typically, a fusion protein is constructed between gene III or gene VIIIand the target protein and then mutagenized, for example by PCRmutagenesis. The mutagenzied library is then expressed in a phagedisplay format, a phage lysate is made, and these phage are affinityselected for those bearing functionally displayed fusion proteins usingan affinity matrix containing a known ligand for the target protein. DNAfrom the functionally selected phage is purified, and the displayedgenes of interest are shuffled and recloned into the phage displayformat. The selection, shuffling and recloning steps are repeated untilthe yield of phage with functional displayed protein has reached desiredlevels as defined, for example, by the fraction of phage that areretained on a ligand affinity matrix or the biological activityassociated with the displayed phage. Individual clones are then screenedto identify candidate mutants with improved display properties, desiredlevel of expression, and functional properties of interest (e.g.,ability to bind a ligand or receptor, lymphokine activity, enzymaticactivity, etc.).

In some embodiments of the invention, a functional screen or selectionis used to identify an evolved protein not expressed on a phage. Thetarget protein, which cannot initially be efficiently expressed in ahost of interest, is mutagenized and a functional screen or selection isused to identify cells expressing functional protein. For example, theprotein of interest may complement a function in the host cell, cleave acolorimetric substrate, etc. Recursive sequence recombination is thenused to rapidly evolve improved functional expression from such a poolof improved mutants.

For example, AMV reverse transcriptase is of particular commercialimportance because it is active at a higher temperature (42° C.) and ismore robust than many other reverse transcriptases. However, it isdifficult to express in prokaryotic hosts such as E. coli, and isconsequently expensive because it has to be purified from chicken cells.Thus an evolved AMV reverse transcriptase that can be expressedefficiently in E. coli is highly desirable.

In brief, the AMV reverse transcriptase gene (Papas et al., J. CellularBiochem 20:95-103 (1982)) is mutagenized by any method common in theart. The library of mutant genes is cloned into a colE1 plasmid (Ampresistant) under control of the lac promoter in a polA12 (Ts) recA718(Sweasy et al. Proc. Natl. Acad. Sci. U.S.A. 90:4626-4630 (1993)) E.coli host. The library is induced with IPTG, and shifted to thenonpermissive temperature. This selects for functionally expressedreverse transcriptase genes under the selective conditions reported forselection of active HIV reverse transcriptase mutants reported by Kim etal. (Proc. Natl. Acad. Sci. (U.S.A.), 92:684-688 (1995)). The selectedAMV RTX genes are recovered by PCR by using oligonucleotides flankingthe cloned gene. The resulting PCR products are subjected to in vitroRSR, selected as described above, and the process is repeated until thelevel of functional expression is acceptable. Individual clones are thenscreened fox RNA-dependent DNA polymerization and other properties ofinterest (e.g. half life at room temperature, error rate). The candidateclones are subjected to mutagenesis, and then tested again to yield anAMV RT that can be expressed in E. coli at high levels.

2. Evolved Chaperonins

In some embodiments of the invention, overexpression of a protein canlead to the accumulation of folding intermediates which have a tendencyto aggregate. Without being limited to any one theory, the role ofchaperonins is thought to be to stabilize such folding intermediatesagainst aggregration; thus, overexpression of a protein of interest canlead to overwhelming the capacity of chaperonins. Chaperonin genes canbe evolved using the techniques of the invention, either alone or incombination with the genes encoding the protein of interest, to overcomethis problem.

Examples of proteins of interest which are especially suited to thisapproach include but are not limited to: cytokines; malarial coatproteins; T cell receptors; antibodies; industrial enzymes (e.g.,detergent proteases and detergent lipases); viral proteins for use invaccines; and plant seed storage proteins.

Sources of chaperonin genes include but are limited to E. colichaperonin genes encoding such proteins as thioredoxin, Gro ES/Gro EL,PapD, ClpB, DsbA, DsbB, DnaJ, DnaK, and GrpE; mammalian chaperonins suchas Hsp70, Hsp72, Hsp73, Hsp40,Hsp60, Hsp10, Hdj1, TCP-1, Cpn60, BiP; andthe homologues of these chaperonin genes in other species such as yeast(J. G. Wall and A. Pluckthun, Current Biology, 6:507-516 (1995); Hartl,Nature, 381:571-580 (1996)). Additionally, heterologous genomic or cDNAlibraries can be used as libraries to select or screen for novelchaperonins.

In general, evolution of chaperonins is accomplished by firstmutagenizing chaperonin genes, screening or selecting for improvedexpression of the target protein of interest, subjecting the mutatedchaperonin genes to RSR, and repeating selection or screening. As withall RSR techniques, this is repeated until the desired improvement ofexpression of the protein of interest is obtained. Two exemplaryapproaches are provide below.

a. Chaperonin Evolution in Trans to the Protein of Interest with aScreen or Selection for Improved Function

In some embodiments the chaperonin genes are evolved independently ofthe gene(s) for the protein of interest. The improvement in the evolvedchaperonin can be assayed, for example, by screening for enhancement ofthe activity of the target protein itself or for the activity of afusion protein comprising the target protein and a selectable orscreenable protein (e.g., GFP, alkaline phosphatase orbeta-galactosidase).

b. Chaperonin Operon in Cis

In some embodiments, the chaperonin genes and the target protein genesare encoded on the same plasmid, but not necessarily evolved together.For example, a lac headpiece dimer can be fused to the protein target toallow for selection of plasmids which encode soluble protein. Chaperoningenes are provided on this same plasmid (“cis”) and are shuffled andevolved rather than the target protein. Similarly, the chaperonin genescan be cloned onto a phagemid plasmid that encodes a gene III or geneVIII fusion with a protein of interest. The cloned chaperonins aremutagenized and, as with the selection described above, phage expressingfunctionally displayed fusion protein are isolated on an affinitymatrix. The chaperonin genes from these phage are shuffled and the cycleof selection, mutation and recombination are applied recursively untilfusion proteins are efficiently displayed in functional form.

3. Improved Intracellular Localization

Many overexpressed proteins of biotechnological interest are secretedinto the periplasm or media to give advantages in purification oractivity assays. Optimization for high level secretion is difficultbecause the process is controlled by many genes and hence optimizationmay require multiple mutations affecting the expression level andstructure of several of these components. Protein secretion in E. coli,for example, is known to be influenced by many proteins including: asecretory ATPase (SecA), a translocase complex (SecB, SecD, SecE, SecF,and SecY), chaperonins (DnaK, DnaJ, GroES, GroEL), signal peptidases(LepB, LspA, Ppp), specific folding catalysts (DsbA) and other proteinsof less well defined function (e.g., Ffh, FtsY) (Sandkvist et al., Curr.Op. Biotechnol. 7:505-511 (1996)). Overproduction of wild type or mutantcopies of these genes for these proteins can significantly increase theyield of mature secreted protein. For example, overexpression of secY orsecY4 significantly increased the periplasmic yield of mature human IL6from a hIL6-pre-OmpA fusion (Perez-Perez et al., Bio-Technology12:178-180 (1994)). Analogously, overexpression of DnaK/DnaJ in E. coliimproved the yield of secreted human granulocyte colony stimulatingfactor (Perez-Perez et al., Biochem. Biophys. Res. Commun. 210:254-259(1995)).

RSR provides a route to evolution of one or more of the above namedcomponents of the secretory pathway. The following strategy is employedto optimize protein secretion in E. coli. Variations on this method,suitable for application to Bacillus subtilis, Pseudomonas, Saccaromycescerevisi, Pichia pastoris, mammalian cells and other hosts are alsodescribed. The general protocol is as follows.

One or more of the genes named above are obtained by PCR amplificationfrom E. coli genomic DNA using known flanking sequence, and cloned in anordered array into a plasmid or cosmid vector. These genes do not ingeneral occur naturally in clusters, and hence these will compriseartificial gene clusters. The genes may be cloned under the control oftheir natural promoter or under the control of another promoter such asthe lac, tac, arabinose, or trp promoters. Typically, rare restrictionsites such as Sfi I are placed between the genes to facilitate orderedreassembly of shuffled genes as described in the methods of theinvention.

The gene cluster is mutagenized and introduced into a host cell in whichthe gene of interest can be inducibly expressed. Expression of thetarget gene to be secreted and of the cloned genes is induced bystandard methods for the promoter of interest (e.g., addition of 1 mMIPTG for the lac promoter). The efficiency of protein secretion by alibrary of mutants is measured, for example by the method of colonyblotting (Skerra et al., Anal. Biochem. 196:151-155 (1991)). Thosecolonies expressing the highest levels of secreted protein (the top0.1-10%; preferably the top 1%) are picked. Plasmid DNA is prepared fromthese colonies and shuffled according to any of the methods of theinvention.

Preferably, each individual gene is amplified from the population andsubjected to RSR. The fragments are digested with Sfi I (introducedbetween each gene with nonpalindromic overhangs designed to promoteordered reassembly by DNA ligase) and ligated together, preferably atlow dilution to promote formation of covalently closed relaxed circles(<1 ng/microliter). Each of the PCR amplified gene populations may beshuffled prior to reassembly into the final gene cluster. The ligationproducts are transformed back into the host of interest and the cycle ofselection and RSR is repeated.

Analogous strategies can be employed in other hosts such as Pseudomonas,Bacillus subtilis, yeast and mammalian cells. The homologs of the E.coli genes listed above are targets for optimization, and indeed many ofthese homologs have been identified in other species (Pugsley, Microb.Rev. 57:50-108 (1993)). In addition to these homologs, other componentssuch as the six polypeptides of the signal recognition particle, thetrans-locating chain-associating membrane protein (TRAM), BiP, the Ssaproteins and other hsp70 homologs, and prsA (B. subtilis) (Simonen andPulva, Microb. Rev. 57:109-137 (1993)) are targets for optimization byRSR. In general, replicating episomal vectors such as SV40-neo (Sambrooket al., Molecular Cloning, CSH Press (1987), Northrup et al., J. Biol.Chem. 268(4):2917-2923 (1993)) for mammalian cells or 2 micron or arsplasmids for yeast (Strathern et al., The Molecular Biology of the YeastSaccaromyces, CSH Press (1982)) are used. Integrative vectors such aspJM 103, pJM 113 or pSGMU2 are preferred for B. subtilis (Perego, Chap.42, pp. 615-624 in: Bacillus subtilis and Other Gram-Positive Bacteria,A. Sonenshein, J. Hoch, and R. Losick, eds., 1993).

For example, an efficiently secreted thermostable DNA polymerase can beevolved, thus allowing the performance of DNA polymerization assays withlittle or no purification of the expressed DNA polymerase. Such aprocedure would be preferred for the expression of libraries of mutantsof any protein that one wished to test in a high throughput assay, forexample any of the pharmaceutical proteins listed in Table I, or anyindustrial enzyme. Initial constructs are made by fusing a signalpeptide such as that from STII or OmpA to the amino terminus of theprotein to be secreted. A gene cluster of cloned genes believed to actin the secretory pathway of interest are mutagenized and coexpressedwith the target construct. Individual clones are screened for expresionof the gene product. The secretory gene clusters from improved clonesare recovered and recloned and introduced back into the original host.Preferably, they are first subjected to mutagenesis before the processis repeated. This cycle is repeated until the desired improvement inexpression of secreted protein is achieved.

IV. Evolved Polypeptide Properties

A. Evolved Transition State Analog and Substrate Binding

There are many enzymes of industrial interest that have substantiallysuboptimal activity on the substrate of interest. In many of thesecases, the enzyme obtained from nature is required to work either underconditions that are very different from the conditions under which itevolved or to have activity towards a substrate that is different fromthe natural substrate.

The application of evolutionary technologies to industrial enzymes isoften significantly limited by the types of selections that can beapplied and the modest numbers of mutants that can be surveyed inscreens. Selection of enzymes or catalytic antibodies, expressed in adisplay format, for binding to transition state analogs (McCafferty etal., Appl. Biochem. Biotechnol. 47:157-171 (1994)) or substrate analogs(Janda et al., Proc. Natl. Acad. Sci. (U.S.A.) 91:2532-2536, (1994))represents a general strategy for selecting for mutants with withimproved catalytic efficiency.

Phage display (O'Neil et al., Current Biology 5:443-449 (1995) and theother display formats (Gates et al., J. Mol. Biol. 255:373-386 (1995);Mattheakis et al., Proc. Natl. Acad. Sci. (U.S.A.) 91:9022-9026 (1994))described herein represent general methodologies for applyingaffinity-based selections to proteins of interest. For example, Matthewsand Wells (Science 260:1113-1117 (1993)) have used phage display of aprotease substrate to select improved substrates. Display of activeenzymes on the surface of phage, on the other hand, allows selection ofmutant proteins with improved transition state analog binding.Improvements in affinity for transition state analogs correlate withimprovements in catalytic efficiency. For example, Patten et al.,Science 271:1086-1091 (1996) have shown that improvements in affinity ofa catalytic antibody for its hapten are well correlated withimprovements in catalytic efficiency, with an 80-fold improvement inkcat/Km being achieved for an esterolytic antibody.

For example, an enzyme used in antibiotic biosynthesis can be evolvedfor new substrate specificity and activity under desired conditionsusing phage display selections. Some antibiotics are currently made bychemical modifications of biologically produced starting compounds.Complete biosynthesis of the desired molecules is currently impracticalbecause of the lack of an enzyme with the required enzymatic activityand substrate specificity (Skatrud, TIBTECH 10:324-329, September 1992).For example, 7-aminodeacetooxycephalosporanic acid (7-ADCA) is aprecursor for semi-synthetically produced cephalosporins. 7-ADCA is madeby a chemical ring expansion of penicillin G followed by enzymaticdeacylation of the phenoxyacetal group. 7-ADCA can be made enzymaticallyfrom deacetylcephalosporin C (DAOC V), which could in turn be derivedfrom penicillin V by enzymatic ring expansion if a suitably modifiedpenicillin expandase could be evolved (Cantwell et al., Curr. Genet.17:213-221 (1990)). Thus, 7-ADCA could in principle be producedenzymatically from penicillin V using a modified penicillin N expandase,such as mutant forms of the S. clavuligerus cefE gene (Skatrud, TIBTECH10:324-329, September 1992). However, penicillin V is not accepted as asubstrate by any known expandase with sufficient efficiency to becommercially useful. As outlined below, RSR techniques of the inventioncan be used to evolve the penicillin expandase encoded by cefE or otherexpandases so that they will use penicillin V as a substrate.

Phage display or other display format selections are applied to thisproblem by expressing libraries of cefE penicillin expandase mutants ina display format, selecting for binding to substrates or transitionstate analogs, and applying RSR to rapidly evolve high affinity binders.Candidates are further screened to identify mutants with improvedenzymatic activity on penicillin V under desired reaction conditions,such as pH, temperature, solvent concentration, etc. RSR is applied tofurther evolve mutants with the desired expandase activity. A number oftransition state analogs (TSA's) are suitable for this reaction. Thefollowing structure is the initial TSA that is used for selection of thedisplay library of cefE mutants:

Libraries of the known penicillin expandases (Skatrud, TIBTECH10:324-329(1992); Cantwell et al., Curr. Genet. 17:13-221 (1990)) aremade as described herein. The display library is subjected to selectionfor binding to penicillin V and/or to transition state analog givenabove for the conversion of penicillin V to DAOC V. These bindingselections may be performed under non-physiological reaction conditions,such as elevated temperature, to obtain mutants that are active underthe new conditions. RSR is applied to evolve mutants with 2-10⁵ foldimprovement in binding affinity for the selecting ligand. When thedesired level of improved binding has been obtained, candidate mutantsare expressed in a high throughput format and specific activity forexpanding penicillin V to DAOC V is quantitatively measured.Recombinants with improved enzymatic activity are mutagenized and theprocess repeated to further evolve them.

Retention of TSA binding by a displayed enzyme (e.g., phage display, lacheadpiece dimer, polysome display, etc.) is a good selection forretention of the overall integrity of the active site and hence can beexploited to select for mutants which retain activity under conditionsof interest. Such conditions include but are not limited to: differentpH optima, broader pH optima, activity in altered solvents such as DMSO(Seto et al., DNA Sequence 5:131-140 (1995)) or formamide (Chen et al.,Proc. Natl. Acad. Sci. (U.S.A.) 90:5618-5622, (1993)) alteredtemperature, improved shelf life, altered or broadened substratespecificity, or protease resistance. A further example, the evolution ofa p-nitrophenyl esterase, using a mammalian display format, is providedbelow.

B. Improvement of DNA and RNA Polymerases

Of particular commercial importance are improved polymerases for use innucleic acid sequencing and polymerase chain reactions. The followingproperties are attractive candidates for improvement of a DNA sequencingpolymerase: (1) suppression of termination by inosine in labelled primerformat (H. Dierick et al., Nucleic Acids Res. 21:4427-4428 (1993)) (2)more normalized peak heights, especially with fluorescently labelleddideoxy terminators (Parker et al., BioTechniques 19:116-121 (1995)),(3) better sequencing of high GC content DNA (>6%, GC) by, for example,tolerating >10%; DMSO (D. Seto et al., DNA Sequence 5:131-140 (1995);Scheidl et al., BioTechniques 19(5):691-694 (1995)), or (4) improvedacceptance of novel base analogs such as inosine, 7-deaza dGTP (Diericket al., Nucleic Acids Res. 21:4427-4428 (1993)) or other novel baseanalogs that improve the above properties.

Novel sequencing formats have been described which use matrix assistedlaser desorption ionization time of flight (MALDT-TOF) mass spectroscopyto resolve dideoxy ladders (Smith, Nature Biotechnology 14:1084-1085(1996)). It is noted in. Smith's recent review that fragmentation of theDNA is the singular feature limiting the development of this method as aviable alternative to standard gel electrophoresis for DNA sequencing.Base analogs which stabilize the N-glycosidic bond by modifications ofthe purine bases to 7-deaza analogs (Kirpekar et al., Rapid Comm. inMass Spec. 9:525-531 (1995)) or of the 2′ hydroxyl (such as 2′-H or2′-F) “relieve greatly the mass range limitation” of this technique(Smith, 1996). Thus, evolved polymerases that can efficientlyincorporate these and other base analogs conferring resistance tofragmentation under MALDI-TOF conditions are valuable innovations.

Other polymerase properties of interest for improvement by RSR are lowfidelity thermostable DNA polymerase for more efficient mutagenesis oras a useful correlate for acceptance of base analogs for the purposesdescribed-above; higher fidelity polymerase for PCR (Lundberg et al.,Gene 108:1-6 (1991)); higher fidelity reverse transcriptase forretroviral gene therapy vehicles to reduce mutation of the therapeuticconstruct and of the retrovirus; improved PCR of GC rich DNA and PCRwith modified bases (S. Turner and F. J. Jenkins, BioTechniques19(1):48-52 (1995)).

Thus, in some embodiments of the invention, libraries of mutantpolymerase genes are screened by direct high throughput screening forimproved sequencing properties. The best candidates are then subjectedto RSR. Briefly, mutant libraries of candidate polymerases such as Taqpolymerase are constructed using standard methods such as PCRmutagenesis (Caldwell et al., PCR Meth. App. 2:28-33 (1992)) and/orcassette mutagenesis (Sambrook et al., Molecular Cloning, CSH Press(1987)). Incorporation of mutations into Taq DNA polymerase such as theactive site residue from T7 polymerase that improves acceptance ofdideoxy nucleotides (Tabor and Richardson, J. Biol. Chem. 265:8322-8328(1990)) and mutations that inactivate the 5′-3′ exonuclease activity (R.S. Rano, BioTechniques 18:390-396 (1995)) are incorporated into theselibraries. The reassembly PCR technique, for example, as described aboveis especially suitable for this problem. Similarly, chimeric polymeraselibraries are made by breeding existing thermophilic polymerases,sequenase, and E. coli polI with each other using the bridgeoligonucleotide methods described above. The libraries are expressed informats wherein human or robotic colony picking is used to replica pickindividual colonies into 96 well plates where small cultures are grown,and polymerase expression is induced.

A high throughput, small scale simple purification for polymeraseexpressed in each well is performed. For example, simple single-steppurifications of His-tagged Taq expressed in E. coli have been described(Smirnov et al., Russian J. Bioorganic Chem. 21(5):341-342 (1995)), andcould readily be adapted for a 96well expression and purificationformat.

A high throughput sequencing assay is used to perform sequencingreactions with the purified samples. The data is analyzed to identifymutants with improved sequencing properties, according to any of thesecriteria: higher quality ladders on GC-rich templates, especiallygreater than 60% GC, including such points as fewer artifactualtermination products and stronger signals than given with the wild-typeenzyme; less termination of reactions by inosine in primer labelledreactions, e.g., fluorescent labelled primers; less variation inincorporation of signals in reactions with fluorescent dideoxynucleotides at any given position; longer sequencing ladders thanobtained with the wild-type enzyme, such as about 20 to 100 nucleotides;improved acceptance of other known base analogs such as 7-deaza purines;improved acceptance of new base analogs from combinatorial chemistrylibraries (See, for example, Hogan, Nature 384(Supp):17 1996).

The best candidates are then subjected to mutagenesis, and then selectedor screened for the improved sequencing properties described above.

In another embodiment, a screen or selection is performed as follows.The replication of a plasmid can be placed under obligate control of apolymerase expressed in E. coli or another microorganism. Theeffectiveness of this system has been demonstrated for making plasmidreplication dependent on mammalian polymerase beta (Sweasy et al., Proc.Natl. Acad. Sci. (U.S.A.) 90:4626-4630, (1993)), Taq polymerase (Suzukiet al., Proc. Natl. Acad. Sci. (U.S.A.) 93:9670-9675 (1996)), or HIVreverse transcriptase (Kim et al., Proc. Natl. Acad. Sci. (U.S.A.)92:684-688 (1995)). The mutant polymerase gene is placed on a plasmidbearing a colE1 origin and expressed under the control of an arabinosepromoter. The library is enriched for active polymerases essentially asdescribed by Suzuki et al, (supra), with polymerase expression beinginduced by the presence of arabinose in the culture.

A further quantitative screen utilizes the presence of GFP (greenfluorescence protein) on the same plasmid, replica plating ontoarabinose at the nonpermissive temperature in the absence of a selectiveantibiotic, and using a fluorimeter to quantitatively measurefluorescence of each culture. GFP activity correlates with plasmidstability and copy number which is in turn dependent on expression ofactive polymerase.

A polymerase with a very high error rate would be a superior sequencingenzyme, as it would have a more normalized signal for incorporation ofbase analogs such as the currently used fluorescently labelled dideoxiesbecause it will have reduced specificity and selectivity. The errorrates of currently used polymerases are on the order of 10⁻⁵ to 10⁻⁶,orders of magnitude lower than what can be detected given the resolvingpower of the gel systems. An error rate of 1%, and possibly as high as10%, could not be detected by current gel systems, and thus.there.is alarge window of opportunity to increase the “sloppiness” of the enzyme.An error-prone cycling polymerase would have other uses such as forhypermutagenesis of genes by PCR.

In some embodiments, the system described by Suzuki (Suzuki et al.,Proc. Natl. Acad. Sci. (U.S.A.) 96:9670-9675 (1996)) is used to makereplication of a reporter plasmid dependent on the expressed polymerase.This system puts replication of the first 200-300 bases next to theColE1 origin directly under the control of the expressed polymerase(Sweasy and Loeb, J. Bact. 177:2923-2925 (1995); Sweasy et al., Proc.Natl. Acad. Sci. (U.S.A.) 90:4626-4630 (1993)). A screenable orselectable reporter gene containing stop codons is positioned in thisregion, such as LacZ alpha containing one, two or three stop codons. Theconstructs are grown on arabinose at the nonpermissible temperature,allowed to recover, and plated on selective lactose minimal media thatdemands reversion of the stop codons in the reporter cassette. Mutantpolymerases are recovered from the survivors by-PCR. The survivors areenriched for mutators because their mutator phenotype increases the rateof reversion of stop codons in the reporter lacZ alpha fragment.

The polymerase genes from the survivors are subjected to RSR, then thepolymerase mutants are retransformed into the indicator strain. Mutatorscan be visually screened by plating on arabinose/Xgal plates at thenonpermissive temperature. Mutator polymerases will give rise tocolonies with a high frequency of blue papillae due to reversion of thestop codon(s). Candidate papillators can be rescreened by picking anon-papillating region of the most heavily papillated colonies (i.e,“best” colonies) and replating on the arabinose/Xgal indicator medium tofurther screen for colonies with increased papillation rates. Thesesteps are repeated until a desired reversion rate is achieved (e.g.,10⁻² to 10⁻³ mutations per base pair per replication).

Colonies which exhibit high frequency papillation are candidates forencoding an error prone polymerase. These candidates are screened forimproved sequencing properties essentially as for the high throughputscreen described above. Briefly, mutant Taq proteins are expressed andpurified in a 96-well format. The purified proteins are used insequencing reactions and the sequence data are analyzed to identifymutants that exhibit the improvements outlined herein. Mutants withimproved properties are subjected to RSR and rescreened for furtherimprovements in function.

In some embodiments, GFP containing stop codons instead of lacZ alphawith stop codons is used for the construction. Cells with reverted stopcodons in GFP are selected by fluorescence activated cell sorter (FACS).In general, FACS selection is performed by gating the brightest about0.1-10%, preferably the top 0.1 to 1%, and collected according to aprotocol similar to that of Dangl et al., (Cytometry 2(6):395-401(1982)). In other embodiments, the polA gene is flanked with lox sitesor other targets of a site specific recombinase. The recombinase isinduced, thus allowing one to inducibly delete the polA gene (Mulbery etal., Nucleic Acid Res. 23:485-490 (1995)). This would allow one toperform “Loeb-type” selections at any temperature and in any host. Forexample, one could set up such a selection in a recA deficient mesophileor thermophile by placing the polA homologue in an inducibly deletableformat and thus apply the selection for active polymerase under moregeneral conditions.

In further embodiments, this general system is preferred for directed invivo mutagenesis of genes. The target gene is cloned into the regionnear a plasmid origin of replication that puts its replicationobligately under control of the error prone polymerase. The construct ispassaged through a polA(ts) recA strain and grown at the nonpermissivetemperature, thus specifically mutagenizing the target gene whilereplicating the rest of the plasmid with high fidelity.

In other embodiments, selection is based on the ability of mutant DNApolymerases to PCR amplify DNA under altered conditions or by utilizingbase analogs. The mutant polymerases act on the template that encodesthem in a PCR amplification, thus differentially replicating thosepolymerases.

In brief, an initial library of mutants is replica plated. Polymerasepreparations are done in a 96-well format. Crude plasmid preparationsare made of the same set. Each plasmid prep is PCR-amplified using thepolymerase prep derived from that plasmid under the conditions for whichone wishes to optimize the polymerase (e.g., added DMSO or formamide,altered temperature of denaturation or extension, altered buffer salts,PCR with base analogs such a-thiol dNTP's for use with mass spectroscopysequencing, PCR of GC rich DNA (>60% GC),PCR with novel base analogssuch as 7-deaza purines, 2′ fluoro dNTP's, rNTP's, PCR with inosine,etc.). The amplified genes are pooled, cloned,and subjected tomutagenesis, and the process repeated until an improvement is achieved.

C. Evolved Phosphonatase

Alkaline phosphatase is a widely used reporter enzyme for ELISA assays,protein fusion assays, and in a secreted form as a reporter gene formammalian cells. The chemical lability of p-nitrophenyl phosphate (pNPP)substrates and the existence of cellular phosphatases that cross-reactwith pNPP is an important limitation on the sensitivity of assays usingthis reporter gene. A reporter gene with superior signal to noiseproperties can be developed based on hydrolysis of p-nitrophenylphosphonates, which are far more stable to base catalyzed hydrolysisthan the corresponding phosphates. Additionally, there are far fewernaturally occurring cellular phosphonatases than alkaline phosphatases.Thus a p-nitrophenyl phosphonatase is an attractive replacement foralkaline phosphatase because the background due to chemical andenzymatic hydrolysis is much lower. This will allow one to make ELISA'smore sensitive for detecting very small concentrations of antigen.

Chen et al. (J. Mol. Biol. 234:165-178 (1993)) have shown that a Staph.aureus beta-lactamase can hydrolyze p-nitrophenyl phosphonate esterswith single turnover kinetics. The active site Ser70 (the active sitenucleophile for beta lactam hydrolysis) forms a covalent intermediatewith the substrate. This is analogous to the first step in hydrolysis ofbeta lactams, and this enzyme can be evolved by RSR to hydrolyzephosphonates by a mechanism analogous to beta lactam hydrolysis. Metcalfand Wanner have described a cryptic phosphonate utilizing operon (phn)in E. coli, and have constructed strains bearing deletions of the phnoperon (J. Bact. 175:3430-3442 (1993)). This paper discloses selectionsfor growth of E. coli on phosphate free minimal media where thephosphorous is derived from hydrolysis of alkyl phosphonates by genes inthe phn operon. Thus, one could select for evolved p-nitrophenylphosphonatases that are active using biochemical selections on definedminimal media. Specifically, an efficient phosphonatase is evolved asfollows. A library of mutants of the Staph. aureus beta lactamase or ofone of the E. coli phn enzymes is constructed. The library istransformed into E. coli mutants wherein the phn operon has beendeleted, and selected for growth on phosphate free MOPS minimal mediacontaining p-nitrophenyl phosphonate. RSR is applied to selected mutantsto further evolve the enzyme for improved hydrolysis of p-nitrophenylphosphonates.

D. Evolved Detergent Proteases

Proteases and lipases are added in large quantities to detergents toenzymatically degrade protein and lipid stains on clothes. Theincorporation of these enzymes into detergents has significantly reducedthe need for surfactants in detergents with a consequent reduction inthe cost of formulation of detergents and improvement in stain removalproperties. Proteases with improved specific activity, improved range ofprotein substrate specificity, improved shelf life, improved stabilityat elevated temperature, and reduced requirements for surfactants wouldadd value to these products.

As an example, subtilisin can be evolved as follows. The clonedsubtilisin gene (von der Osten et al., J. Biotechnol. 28:55-68 (1993))can be subjected to RSR using growth selections on complex protein mediaby virtue of secreted subtilisin degrading the complex protein mixture.More specifically, libraries of subtilisin mutants are constructed in anexpression vector which directs the mutant protein to be secreted byBacillus subtilus. Bacillus hosts transformed with the libraries aregrown in minimal media with complex protein formulation as carbon and/ornitrogen source. Subtilisin genes are recovered from fast growers andsubjected to RSR, then screened for improvement in a desired property.

E. Escape of Phage from a “Protein Net”

In some embodiments, selection for improved proteases is performed asfollows. A library of mutant protease genes is constructed on a displayphage and the phage grown in a multiwell format or on plates. The phageare overlayed with a “protein net” which ensnares the phage. The net canconsist of a protein or proteins engineered with surface disulphides andthen crosslinked with a library of peptide linkers. A further embodimentemploys an auxiliary matrix to further trap the phage. The phage arefurther incubated, then washed to collect liberated phage wherein thedisplayed protease was able to liberate the phage from the protein net.The protease genes are then subjected to RSR for further evolution. Afurther embodiment employs a library of proteases encoded by but notdisplayed on a phagemid wherein streptavidin is fused to pIII by apeptide linker. The library of protease mutants is evolved to cleave thelinker by selecting phagemids on a biotin column between rounds ofamplification.

In a further embodiment, the protease is not necessarily provided in adisplay format. The host cells secrete the protease encoded by but notsurface displayed by a phagemid, while constrained to a well, forexample, in a microtiter plate. Phage display format is preferred wherean entire high titre lysate is encased in a protein net matrix, and thephage expressing active and broad specificity proteases digesting thematrix to be liberated for the next round of amplification, mutagenesis,and selection.

In a further embodiment, the phage are not constrained to a well but,rather, protein binding filters are used to make a colony of plaquelifts and are screened for activity with chromogenic or fluorogenicsubstrates. Colonies or plaques corresponding to positive spots on thefilters are picked and the encoded protease genes are recovered by, forexample, PCR. The protease genes are then subjected to RSR for furtherevolution.

F. Screens for Improved Protease Activity

Peptide substrates containing fluoropores attached to the carboxyterminus and fluorescence quenching moieties on the amino terminus, suchas those described by Holskin, et al, (Anal. Biochem. 227:148-55 (1995))(e.g.,(4-4′-dimethylaminophenazo)benzoyl-arg-gly-val-val-asn-ala-ser-ser-arg-leu-ala-5-(2′-aminoethyl)-amino]-naphthalene-1-sulfonicacid) are used to screen protease mutants for broadened or alteredspecificity. In brief, a library of peptide substrates is designed witha flourophore on the amino terminus and a potent fluorescence quencheron the carboxy terminus, or vice versa. Supernatants containing secretedproteases are incubated either separately with various members of thelibrary or with a complex cocktail. Those proteases which are highlyactive and have broad specificity will cleave the majority of thepeptides, thus releasing the fluorophore from the quencher and giving apositive signal on a fluorimeter. This technique is amenable to a highdensity multiwell format.

G. Improving Pharmaceutical Proteins Using RSR

Table I lists proteins that are of particular commercial interest to thepharmaceutical industry. These proteins are all candidates for RSRevolution to improve function, such as ligand binding, shelf life,reduction of side effects through enhanced specificity, etc. All arewell-suited to manipulation by the techniques of the invention.Additional embodiments especially applicable to this list are describedbelow.

First, high throughput methods for expressing and purifying libraries ofmutant proteins, similar to the methods described above for Tagpolymerase, are applied to the proteins of

Table I. These mutants are screened for activity in a functional assay.For example, mutants of IL2 are screened for resistance to plasma ortissue proteases with retention of activity for the low affinity IL2receptor but with loss of activity on the high affinity IL2 receptor.The genes from mutants with improved activity relative to wild-type arerecovered, and subjected to RSR to improve the phenotype further.

Preferably, the libraries are generated in a display format such thatthe mature folded protein is physically linked to the geneticinformation that encodes it. Examples include phage display usingfilamentous phage (O'Neil et al., Current Biology 5:443-449 (1995)) orbacteriophage lambda gene V display (Dunn, J. Mol. Biol. 248:497-506(1995)), peptides on plasmids (Gates et al., J. Mol. Biol. 255:373-386(1995)) where the polypeptide of interest is fused to a lac headpiecedimer and the nascent translation product binds to a lac operator siteencoded on the plasmid or PCR product, and polysome display (Mattheakiset al., Proc. Natl. Acad. Sci. (U.S.A.) 91:9022-9026 (1994)) whereribosomes are stalled on mRNA molecules such that the nascentpolypeptide is exposed for interaction with cognate ligands withoutdisrupting the stalled ribosome/mRNA complex. Selected complexes aresubjected to RT-PCR to recover the genes.

When so displayed, affinity binding of the recombinant phage is oftendone using a receptor for the protein of interest. In some cases it isimpractical to obtain purified receptor with retention of all desiredbiological characteristics (for example, 7-transmembrane (7-TM)receptors). In such cases, one could use cells expressing the receptoras the panning substrate. For example, Barry et al. (Nat. Med. 2:299-305(1996)) have described successful panning of M13 libraries against wholecells to obtain phage that bind to the cells expressing a receptor ofinterest. This format could be generally applied to any of the proteinslisted in Table I.

In some embodiments, the following method can be used for selection. Alysate of phage encoding IFN alpha mutants, for example, can be useddirectly at suitable dilution to stimulate cells with a GFP reporterconstruct (Crameri et al.; Nat. Med. 14:315-319 (1996)) under thecontrol of an IFN responsive promoter, such as an MHC class I promoter.Phage remaining attached after stimulation, expression and FACSpurification of the responsive cells, can be purified by FACS.Preferably, the brightest cells are collected. The phage are collectedand their DNA subjected to RSR until the level of desired improvement isachieved.

Thus, for example, IL-3 is prepared in one of these display formats andsubjected to RSR to evolve an agonist with a desired level of activity.A library of IL3 mutants on a filamentous phage vector is created andaffinity selected (“panned”) against purified IL3 receptor to obtainmutants with improved affinity. The mutant IL-3 genes are recovered byPCR, subjected to RSR, and recloned into the display vector. The cycleis repeated until the desired affinity or agonist activity is achieved.

Many proteins of interest are expressed as dimers or higher ordermultimeric forms. In some embodiments, the display formats describedabove preferentially are applied to a single chain version of theprotein. Mutagenesis, such as RSR, can be used in these display formatsto evolve improved single chain derivatives of multimeric factors whichinitially have low but detectable activity. This strategy is describedin more detail below.

H. Whole Cell Selections

In some embodiments, the eukaryotic cell is the unit of biologicalselection. The following general protocol can be used to apply RSR tothe improvement of proteins using eukaryotic cells as the unit ofselection: (1) transfection of libraries of mutants into a suitable hostcell, (2) expression of the encoded gene product(s) either transientlyor stably, (3) functional selection for cells with an improved phenotype(expression of a receptor with improved affinity for a target ligand;viral resistance, etc., (4) recovery of the mutant genes by, forexample, PCR followed by preparation of HIRT supernatants withsubsequent transformation of E. coli, (5) RSR and (6) repetition ofsteps (1)-(5) until the desired degree of improvement is achieved.

For example, previous work has shown that one can use mammalian surfacedisplay to functionally select cells expressing cloned genes, such asusing an antibody to clone the gene for an expressed surface protein(Reviewed by Seed, Curr. Opin. Biotechnol. 6:567-573 (1995)). Briefly,cells are transiently transfected with libraries of cloned genesresiding on replicating episomal vectors. An antibody directed againstthe protein of interest (whose gene one wishes to clone) is immobilizedon a solid surface such as a plastic dish, and the transfected cellsexpressing the protein of interest are affinity selected.

For example, the affinity of an antibody for a ligand can be improvedusing mammalian surface display and RSR. Antibodies with higher affinityfor their cognate ligands are then screened for improvement of one ormore of the following properties: (1) improved therapeutic properties(increased cell killing, neutralization of ligands, activation of signaltransduction pathways by crosslinking receptors), (2) improved in vivoimaging applications (detection of the antibody by covalent/noncovalentbinding of a radionuclide or any agent detectable outside of the body bynoninvasive means, such as NMR), (3) improved analytical applications(ELISA detection of proteins or small molecules), and (4) improvedcatalysts (catalytic antibodies). The methods described are general andcan be extended to any receptor-ligand pair of interest. A specificexample is provided in the experimental section.

The use of a one mutant sequence-one transfected cell protocol is apreferred design feature for RSR based protocols because the point is touse functional selection to identify mutants with improved phenotypesand, if the transfection is not done in a “clonal” fashion, thefunctional phenotype of any given cell is the result of the sum of manytransfected sequences. Protoplast fusion is one method to achieve thisend, since each protoplast contains typically greater than 50 copieseach of a single plasmid variant. However, it is a relatively lowefficiency process (about 10³-10⁴ transfectants), and it does not workwell on some non-adherent cell lines such as B cell lines. Retroviralvectors provide a second alternative, but they are limited in the sizeof acceptable insert (<10 kb) and consistent, high expression levels aresometimes difficult to achieve. Random integration results in varyingexpression levels, thus introducing noise and limiting one's ability todistinguish between improvements in the affinity of the mutant proteinvs. increased expression. A related class of strategies that can be usedeffectively to achieve “one gene-one cell” DNA transfer and consistentexpression levels for RSR is to use a viral vector which contains a loxsite and to introduce this into a host that expresses cre recombinase,preferably transiently, and contains one or more lox sites integratedinto its genome, thus limiting the variability of integration sites(Rohlman et al. Nature Biotech. 14:1562-1565 (1996)).

An alternative strategy is to transfect with limiting concentrations ofplasmid (i.e., about one copy per cell) using a vector that canreplicate in the target cells, such as is the case with plasmids bearingSV40 origins transfected into COS cells. This strategy requires thateither the host cell or the vector supply a replication factor such,asSV40 large T antigen. Northrup et al. (J. Biol. Chem. 268:2917-2923(1993)) describe a strategy wherein a stable transfectant expressingSV40 large T antigen is then transfected with vectors bearing SV40origins. This format gave consistently higher transient expression anddemonstrable plasmid replication, as assayed by sensitivity to digestionby Dpn I. Transient expression (i.e, non-integrating plasmids) is apreferred format for cellular display selections because it reduces thecycle time and increases the number of mutants that can be screened.

The expression of SV40 large T antigen or other replication factors mayhave deleterious effects on or may work inefficiently in some cells. Insuch cases, RSR is applied to the replication factor itself to evolvemutants with improved activity in the cell type of interest. A genericprotocol for evolving such a factor is as follows:

The target cell is transfected with GFP cloned onto a vector containingSV40 large T antigen, an SV40 origin, and a reporter gene such as GFP; arelated format is cotransfection with limiting amounts of the SV40 largeT antigen expression vector and an excess of a reporter such as GFPcloned onto an SV40 origin containing plasmid. Typically after 1-10 daysof transient expression, the brightest cells are purified by FACS. SV40large T antigen mutants are recovered by PCR, and subjected tomutagenesis. The cycle is repeated until the desired level ofimprovement is obtained.

I. Autocrine Selection

In some embodiments, mutant proteins are selected or screened based ontheir ability to exert a biological effect in an autocrine fashion onthe cell expressing the mutant protein. For example, a library of alphainterferon genes can be selected for induction of more potent or morespecific antiviral activity as follows. A library of interferon alphamutants is generated in a vector which allows for induction ofexpression (i.e. under control of a metallothionein promoter) andefficient secretion in a multiwell format (96-well for example) with oneor a few independent clones per well. In some embodiments, the promoteris not inducible, and may be constitutive.

Expression of the cloned interferon genes is induced. The cells arechallenged with a cytotoxic virus against which one wishes to evolve anoptimized interferon (for example vesicular stomatitus virus or HIV).Surviving cells are recovered. The cloned interferon genes are recoveredby PCR amplification, subjected to RSR, and cloned back into thetransfection vector and retransfected into the host cells. These stepsare repeated until the desired level of antiviral activity is evolved.

In some embodiments, the virus of interest is not strongly cytotoxic. Inthis case a conditionally lethal gene, such as herpes simplex virusthymidine kinase, is cloned into the virus and after challenge withvirus and recovery, conditionally lethal selective conditions areapplied to kill cells that are infected with virus. An example of aconditionally lethal gene is herpes TK, which becomes lethal upontreating cells expressing this gene with the thymidine analog acyclovir.In some embodiments, the antiproliferative activity of the clonedinterferons is selected by treating the cells with agents that killdividing cells (for example, DNA alkylating agents).

In some embodiments, potent cytokines are selected by expressing andsecreting a library of cytokines in cells that have GFP or anotherreporter under control of a promoter that is induced by the cytokine,such as the MHC class I promoter being induced by evolved variants ofalpha interferon. The signal transduction pathway is configured suchthat the wild type cytokine to be evolved gives a weak but detectablesignal.

J. Half Life in Serum

In some embodiments of the invention, proteins are evolved by RSR tohave improved half life in serum. A preferred method for improvinghalf-life is evolving the affinity of a protein of interest for a longlived serum protein, such as an antibody or other abundant serumprotein. Examples of how affinity for an antibody can enhance serum halflife include the co-administration of IL2 and anti-IL2 antibodies whichincreases serum half-life and anti-tumor activity of human recombinantIL2 (Courtney et al., Immunopharmacology 28:223-232 (1994)).

The eight most abundant human serum proteins are serum albumin,immunoglobulins, lipoproteins, haptoglobin, fibrinogen, transferrin,alpha-1 antitrypsin, and alpha-2 macroglobulin (Doolittle, chapter 6,The Plasma Proteins F. Putnam, ed.; Academic Press, 1984). These andother abundant serum proteins such as ceruloplasmin and fibronectin arethe primary targets against which to evolve binding sites on therapeuticproteins such as in Table I for the purpose of extending half-life. Inthe case of antibodies, the preferred strategy is to evolve affinity forconstant regions rather than variable regions in order to minimizeindividual variation in the concentration of the relevant target epitope(antibody V region usage between different individuals is significantlyvariable).

Binding sites of the desired affinity are evolved by applying phagedisplay, peptides on plasmid display or polysome display selections tothe protein of interest. One could either mutagenize an existing bindingsite or otherwise defined region of the target protein, or append apeptide library to the N terminus, C terminus, or internally as afunctionally nondisruptive loop.

In other embodiments of the invention, half life is improved byderivatization with PEG, other polymer conjugates or half-life extendingchemical moieties. These are established methods for extending half-lifeof therapeutic proteins (R. Duncan, Clin. Pharmacokinet 27:290-306(1994); Smith et al., TIBTECH 11 397-403 (1993)) and can have the addedbenefit of reducing immunogenicity (R. Duncan, Clin. Pharmacokinet27:290-306 (1994)). However, derivatization can also result in reducedaffinity of the therapeutic protein for its receptor or ligand. RSR isused to discover alternative sites in the primary sequence that can besubstituted with lysine or other appropriate residues for chemical orenzymatic conjugation with half-life extending chemical moieties, andwhich result in proteins with maximal retention of biological activity.

A preferred strategy is to express a library of mutants of the proteinin a display format, derivatize the library with the agent of interest(i.e. PEG) using chemistry that does not biologically inactivate thedisplay system, select based on affinity for the cognate receptor, PCRamplify the genes encoding the selected mutants, shuffle, reassemble,reclone into the display format, and iterate until a mutant with thedesired activity post modification is obtained. An alternative format isto express, purify and derivatize the mutants in a high throughputformat, screen for mutants with optimized activity, recover thecorresponding genes, subject the genes to RSR and repeat.

In further embodiments of the invention, binding sites for target humanproteins that are localized in particular tissues of interest areevolved by RSR. For example, an interferon that localizes efficiently tothe liver can be engineered to contain a binding site for a liversurface protein such as hepatocyte growth factor receptor. Analogously,one could evolve affinity for abundant epitopes on erythrocytes such asABO blood antigens to localize a given protein to the blood stream.

In further embodiments of the invention, the protein of interest isevolved to have increased stability to proteases. For example, theclinical use of IL2 is limited by serious side effects that are relatedto the need to administer high doses. High doses are required due to theshort half life (3-5 min, Lotze et al., JAMA 256(22):3117-3124 (1986))and the consequent need for high doses to maintain a therapeutic levelof IL2. One of the factors contributing to short half-lives oftherapeutic proteins is proteolysis by serum proteases. Cathepsin D, amajor renal acid protease, is responsible for the degradation of IL2 inBalb/c mice (Ohnishi et al., Cancer Res. 50:1107-1112 (1990)).Furthermore, Ohnishi showed that treatment of Balb/c mice withpepstatin, a potent inhibitor of this protease, prolongs the half lifeof recombinant human IL2 and augments lymphokine-activated killer cellactivity in this mouse model.

Thus, evolution of protease resistant variants of IL2 or any of theproteins listed in Table I that are resistant to serum or kidneyproteases is a preferred strategy for obtaining variants with extendedserum half lives.

A preferred protocol is as follows. A library of the mutagenized proteinof interest is expressed in a display system with a gene-distal epitopetag (i.e. on the N-terminus of a phage display construct such that if itis cleaved off by proteases, the epitope tag is lost). The expressedproteins are treated with defined proteases or with complex cocktailssuch as whole human serum. Affinity selection with an antibody to thegene distal tag is performed. A second selection demanding biologicalfunction (e.g., binding to cognate receptor) is performed. Phageretaining the epitope tag (and hence protease resistant) are recoveredand subjected to RSR. The process is repeated until the desired level ofresistance is attained.

In other embodiments, the procedure is performed in a screening formatwherein mutant proteins are expressed and purified in a high throughputformat and screened for protease resistance with retention of biologicalactivity.

In further embodiments of the invention, the protein of interest isevolved to have increased shelf life. A library of the mutagenizednucleic acid sequence encoding the protein of interest is expressed in adisplay format or high throughput expression format, and exposed forvarious lengths of time to conditions for which one wants to evolvestability (heat, metal ions, nonphysiological pH of, for example, <6or >8, lyophilization, freeze-thawing). Genes are recovered from fromsurvivors, for example, by PCR. The DNA is subjected to mutagenesis,such as RSR, and the process repeated until the desired level ofimprovement is achieved.

K. Evolved Single Chain Versions of Multisubunit Factors

As discussed above, in some embodiments of the invention, the substratefor evolution by RSR is preferably a single chain construction. Thepossibility of performing asymetric mutagenesis on constructs ofhomomultimeric proteins provides important new pathways for furtherevolution of such constructs that is not open to the proteins in theirnatural homomultimeric states. In particular, a given mutation in ahomomultimer will result in that change being present in each identicalsubunit. In single chain constructs, however, the domains can mutateindependently of each other.

Conversion of multisubunit proteins to single chain constructs with newand useful properties has been demonstrated for a number of proteins.Most notably, antibody heavy and light chain variable domains have beenlinked into single chain Fv's (Bird et al., Science 242:423-426 (1988)),and this strategy has resulted in antibodies with improved thermalstability (Young et al., FEBS Lett 377:135-139 (1995)), or sensitivityto proteolysis (Solar et al., Prot. Eng. 8:717-723 (1995)). A functionalsingle chain version of IL5, a homodimer, has been constructed, shown tohave affinity for the IL5 receptor similar to that of wild type protein,and this construct has been used to perform assymetric mutagenesis ofthe dimer (Li et al., J. Biol. Chem. 271:1817-1820 (1996)). A singlechain version of urokinase-type plasminogen activator has been made, andit has been shown that the single chain construct is more resistant toplasminogen activator inhibitor type 1 than the native homodimer (Higaziet al., Blood 87:3545-3549 (1996)). Finally, a single-chain insulin-likegrowth factor I/insulin hybrid has been constructed and shown to havehigher affinity for chimeric insulin/IGF-1 receptors than that of eithernatural ligand (Kristensen et al., Biochem. J. 305:981-986 (1995)).

In general, a linker is constructed which joins the amino terminus ofone subunit of a protein of interest to the carboxyl terminus of anothersubunit in the complex. These fusion proteins can consist of linkedversions of homodimers, homomultimers, heterodimers or higher orderheteromultimers. In the simplest case, one adds polypeptide linkersbetween the native termini to be joined. Two significant variations canbe made. First, one can construct diverse libraries of variations of thewild type sequence in and around the junctions and in the linkers tofacilitate the construction of active fusion proteins. Secondly, Zhanget al., (Biochemistry 32:12311-12318 (1993)) have described circularpermutations of T4 lysozyme in which the native amino and carboxyltermini have been joined and novel amino and carboxyl termini have beenengineered into the protein. The methods of circular permutation,libraries of linkers, and libraries of junctional sequences flanking thelinkers allow one to construct libraries that are diverse in topologicallinkage strategies and in primary sequence. These libraries areexpressed and selected for activity. Any of the above mentionedstrategies for screening or selection can be used, with phage displaybeing preferable in most cases. Genes encoding active fusion proteinsare recovered, mutagenized, reselected, and subjected to standard RSRprotocols to optimize their function. Preferably, a population ofselected mutant single chain constructs is PCR amplified in two separatePCR reactions such that each of the two domains is amplified separately.Oligonucleotides are derived from the 5′ and 3′ ends of the gene andfrom both strands of the linker. The separately amplified domains areshuffled in separate reactions, then the two populations are recombinedusing PCR reassembly to generate intact single chain constructs forfurther rounds of selection and evolution.

V. Improved Properties of Pharmaceutical Proteins

A. Evolved Specificity for Receptor or Cell Type of Interest

The majority of the proteins listed in Table I are either receptors orligands of pharmaceutical interest. Many agonists such as chemokines orinterleukins agonize more than one receptor. Evolved mutants withimproved specificity may have reduced side effects due to their loss ofactivity on receptors which are implicated in a particular side effectprofile. For most of these ligand/receptors, mutant forms with improvedaffinity would have improved pharmaceutical properties. For example, anantagonistic form of RANTES with improved affinity for CKR5 should be animproved inhibitor of HIV infection by virtue of achieving greaterreceptor occupancy for a given dose of the drug. Using the selectionsand screens outlined above in combination with RSR, the affinities andspecificities of any of the proteins listed in Table I can be improved.For example, the mammalian display format could be used to evolve TNFreceptors with improved affinity for TNF.

Other examples include evolved interferon alpha variants that arresttumor cell proliferation but do not stimulate NK cells, IL2 variantsthat stimulate the low affinity IL2 receptor complex but not the highaffinity receptor (or vice versa), superantigens that stimulate only asubset of the V beta proteins recognized by the wild type protein(preferably a single V beta), antagonistic forms of chemokines thatspecifically antagonize only a receptor of interest, antibodies withreduced cross-reactivity, and chimeric factors that specificallyactivate a particular receptor complex. As an example of this lattercase, one could make chimeras between IL2 and IL4, 7, 9, or 15 that alsocan bind the IL2 receptor alpha, beta and gamma chains (Theze et al.,Imm. Today 17:481-486 (1996)), and select for chimeras that retainbinding for the intermediate affinity IL2 receptor complex on monocytesbut have reduced affinity for the high affinity IL2 alpha, beta, gammareceptor complex on activated T cells.

B. Evolved Agonists with Increased Potency

In some embodiments of the invention, a preferred strategy is theselection or screening for mutants with increased agonist activity usingthe whole cell formats described above, combined with RSR. For example,a library of mutants of IL3 is expressed in active form on phage asdescribed by Gram et al. (J. Immun. Meth. 161:169-176 (1993)). Clonallysates resulting from infection with plaque-purified phage are preparedin a high through-put format such as a 96-well microtiter format. AnIL3-dependent cell line expressing a reporter gene such as GFP isstimulated with the phage lysates in a high throughput 96-well. Phagethat result in positive signals at the greatest dilution of phagesupernatants are recovered; alternatively, DNA encoding the mutant IL3can be recovered by PCR. In some embodiments, single cells expressingGFP under control of an IL3 responsive promoter can stimulated with theIL3 phage library, and the positive FACS sorted. The nucleic acid isthen subjected to PCR, and the process repeated until the desired levelof improvement is obtained.

TABLE I POLYPEPTIDE CANDIDATES FOR EVOLUTION Name Alpha-1 antitrypsinAngiostatin Antihemolytic factor Apolipoprotein Apoprotein Atrialnatriuretic factor Atrial natriuretic polypeptide Atrial peptides C—X—Cchemokines (e.g., T39765, NAP-2, ENA-78, Gro-a, Gro-b, Gro-c, IP-10,GCP-2, NAP-4, SDF-1, PF4, MIG) Calcitonin CC chemokines (e.g., Monocytechemoattractant protein-1, Monocyte   chemoattractant protein-2,Monocyte chemoattractant protein-   3, Monocyte inflammatory protein-1alpha, Monocyte   inflammatory protein-1 beta, RANTES, I309, R83915,R91733,   HCC1, T58847, D31065, T64262) CD40 ligand Collagen Colonystimulating factor (CSF) Complement factor 5a Complement inhibitorComplement receptor 1 Factor IX Factor VII Factor VIII Factor XFibrinogen Fibronectin Glucocerebrosidase Gonadotropin Hedgehog proteins(e.g., Sonic, Indian, Desert) Hemoglobin (for blood substitute; forradiosensitization) Hirudin Human serum albumin Lactoferrin LuciferaseNeurturin Neutrophil inhibitory factor (NIF) Osteogenic proteinParathyroid hormone Protein A Protein G Relaxin Renin Salmon calcitoninSalmon growth hormone Soluble complement receptor I Soluble I-CAM 1Soluble interleukin receptors (IL-1, 2, 3, 4, 5, 6, 7, 9, 10, 11,   12,13, 14, 15) Soluble TNF receptor Somatomedin Somatostatin SomatotropinStreptokinase Superantigens, i.e., Staphylococcal enterotoxins (SEA,SEB, SEC1,   SEC2, SEC3, SED, SEE), Toxic shock syndrome toxin (TSST-1),  Exfoliating toxins A and B, Pyrogenic exotoxins A, B, and C,   and M.arthritidis mitogen Superoxide dismutase Thymosin alpha 1 Tissueplasminogen activator Tumor necrosis factor beta (TNF beta) Tumornecrosis factor receptor (TNFR) Tumor necrosis factor-alpha (TNF alpha)Urokinase

C. Evolution of Components of Eukaryotic Signal Transduction orTranscriptional Pathways

Using the screens and selections listed above, RSR can be used inseveral ways to modify eukaryotic signal transduction or transcriptionalpathways. Any component of a signal transduction pathway of interest, ofthe regulatory regions and transcriptional activators that interact withthis region and with chemicals that induce transcription can be evolved.This generates regulatory systems in which transcription is activatedmore potently by the natural inducer or by analogues of the normalinducer. This technology is preferred for the development andoptimization of diverse assays of biotechnological interest. Forexample, dozens of 7 transmembrane receptors (7-TM) are validatedtargets for drug discovery (see, for example, Siderovski et al., CurrBiol., 6(2):211-212 (1996); An et al., FEBS Lett., 375(1-2):121-124(1995); Raport et al., Gene, 163(2):295-299 (1995); Song et al.,Genomics, 28(2):347-349 (1995); Strader et al. FASEB J., 9(9):745-754(1995); Benka et al., FEBS Lett., 363(1-2):49-52 (1995); Spiegel, J.Clin Endocrinol. Metab., 81(7):2434-2442 (1996); Post et al., FASEB J.,10(7):741-749 (1996); Reisine et al., Ann NY Acad. Sci., 780:168-175(1996); Spiegel, Annu. Ref. Physiol., 58:143-170 (1996); Barak et al.,Biochemistry, 34(47):15407-15414 (1995); and Shenker, Baillieres Clin.Endocrinol. Metab., 9(3):427-451 (1995)). The development of sensitivehigh throughput assays for agonists and antagonists of these receptorsis essential for exploiting the full potential of combinatorialchemistry in discovering such ligands. Additionally, biodetectors orbiosensors for different chemicals can be developed by evolving 7-TM'sto respond agonistically to novel chemicals or proteins of interest. Inthis case, selection would be for contructs that are activated by thenew chemical or polypeptide to be detected. Screening could be donesimply with fluorescence or light activated cell sorting, since thedesired improvement is coupled to light production.

In addition to detection of small molecules such as pharmaceutical drugsand environmental pollutants, biosensors can be developed that willrespond to any chemical for which there are receptors, or for whichreceptors can be evolved by recursive sequence recombination, such ashormones, growth factors, metals and drugs. The receptors may beintracellular and direct activators of transcription, or they maybe,membrane bound receptors that activate transcription of the signalindirectly, for example by a phosphorylation cascade. They may also notact on transcription at all, but may produce a signal by somepost-transcriptional modification of a component of the signalgenerating pathway. These receptors may also be generated by fusingdomains responsible for binding different ligands with differentsignaling domains. Again, recursive sequence recombination can be usedto increase the amplitude of the signal generated to optimize expressionand functioning of chimeric receptors, and to alter the specificity ofthe chemicals detected by the receptor.

For example, G proteins can be evolved to efficiently couple mammalian7-TM receptors to yeast signal transduction pathways. There are 23presently known G alpha protein loci in mammals which can be grouped bysequence and functional similarity into four groups, Gs (Gna, Gnal), Gi(Gnai-2, Gnai-3, Gnai-1, Gnao, Gnat-i, Gnat-2, Gnaz), Gq (Gnaq, Gna-11,Gna-14, Gna-15) and G12 (Gna-12, Gna-13) (B. Nurnberg et al., J. Mol.Med., 73:123-132 (1995)). They possess an endogenous GTP-ase activityallowing reversible functional coupling between ligand-bound receptorsand downstream effectors such as enzymes and ion channels. G alphaproteins are complexed noncovalently with G beta and G gamma proteins aswell as to-their cognate 7-TM receptor(s). Receptor and signalspecificity are controlled by the particular combination of G alpha, Gbeta (of which there are five known loci) and G gamma (seven known loci)subunits. Activation of the heterotrimeric complex by ligand boundreceptor results in dissociation of the complex into G alpha monomersand G beta, gamma dimers which then transmit signals by associating withdownstream effector proteins. The G alpha subunit is believed to be thesubunit that contacts the 7-TM, and thus it is a focal point for theevolution of chimeric or evolved G alpha subunits that can transmitsignals from mammalian 7-TM's to yeast downstream genes.

Yeast based bioassays for mammalian receptors will greatly facilitatethe discovery of novel ligands. Kang et al. (Mol. Cell Biol.10:2582-2590 (1990)) have described the partial complementation of yeaststrains bearing mutations in SCG1 (GPA1), a homologue of the alphasubunits of G proteins involved in signal transduction in mammaliancells, by mammalian and hybrid yeast/mammalian G alpha proteins. Thesehybrids have partial function, such as complementing the growth defectin scg1 strains, but do not allow mating and hence do not fullycomplement function in the pheromone signal transduction pathway. Priceet al. (Mol. Cell Biol. 15:6188-6195 (1995)) have expressed ratsomatostatin receptor subtype 2 (SSTR2) in yeast and demonstratedtransmission of ligand binding signals by this 7-TM receptor throughyeast and chimeric mammalian/yeast G alpha subunits (“coupling”) to aHIS3 reporter gene, under control of the pheromone responsive promoterFUS-1 enabling otherwise HIS3(−) cells to grow on minimal medium lackinghistidine.

Such strains are useful as reporter strains for mammalian receptors, butsuffer from important limitations as exemplified by the study of Kang etal., where there appears to be a block in the transmission of signalsfrom the yeast pheromone receptors to the mammalian G proteins. Ingeneral, to couple a mammalian 7-TM receptor to yeast signaltransduction pathways one couples the mammalian receptor to yeast,mammalian, or chimeric G alpha proteins, and these will in turnproductively interact with downstream components in the pathway toinduce expression of a pheromone responsive promoter such as FUS-1. Suchfunctional reconstitution is commonly referred to as “coupling”.

The methods described herein can be used to evolve the coupling ofmammalian 7-TM receptors to yeast signal transduction pathways. Atypical approach is as follows: (1) clone a 7-TM of interest into ayeast strain with a modified pheromone response pathway similar to thatdescribed by Price (e.g., strains deficient in FAR1, a negativeregulator of G₁ cyclins, and deficient in SST2 which causes the cells tobe hypersensitive to the presence of pheromone), (2) construct librariesof chimeras between the mammalian G alpha protein(s) known or thought tointeract with the GPA1 or homologous yeast G alpha proteins, (3) place aselectable reporter gene such as HIS3 under control of the pheromoneresponsive promoter FUS1 (Price et al., Mol. Cell Biol. 15:6188-6195(1995)). Alternatively, a screenable gene such as luciferase may beplaced under the control of the FUS1 promoter; (4) transform library (2)into strain (3) (HIS(−)), (5) screen or select for expression of thereporter in response to the ligand of interest, for example by growingthe library of transformants on minimal plates in the presence of ligandto demand HIS3 expression, (6) recover the selected cells, and and applyRSR to evolve improved expression of the reporter under the control ofthe pheromone responsive promoter FUS1.

A second important consideration in evolving strains with optimizedreporter constructs for signal transduction pathways of interest isoptimizing the signal to noise ratio (the ratio of gene expression underinducing vs noninducing conditions). Many 7-TM pathways are leaky suchthat the maximal induction of a typical reporter gene is 5 to 10-foldover background. This range of signal to noise may be insufficient todetect small effects in many high through put assays. Therefore, it isof interest to couple the 7-TM pathway to a second nonlinearamplification system that is tuned to be below but near the threshold ofactivation in the uninduced state. An example of a nonlinearamplification system is expression of genes driven by the lambda P_(L)promoter. Complex cooperative interactions between lambda repressorbound at three adjacent sites in the cI promoter result in veryefficient repression above a certain concentration of repressor. Below acritical threshold dramatic induction is seen and there is a windowwithin which a small decrease in repressor concentration leads to alarge increase in gene expression (Ptashne, A Genetic Switch: PhageLambda and Higher Organisms, Blackwell Scientific Publ. Cambridge,Mass., 1992). Analogous effects are seen for some eukaryotic promoterssuch as those regulated by GAL4. Placing the expression of a limitingcomponent of a transcription factor for such a promoter (GAL4) under thecontrol of a GAL4 enhanced 7-TM responsive promoter results in smalllevels of induction of the 7-TM pathway signal being amplified to a muchlarger change in the expression of a reporter construct also under thecontrol of a GAL4 dependent promoter.

An example of such a coupled system is to place GAL4 under control ofthe FUS-1 pheromone responsive promoter and to have the intracellularGAL4 (itself a transcriptional enhancer) level positively feedback onitself by placing a GAL4 binding site upstream of the FUS-1 promoter. Areporter gene is also put under the control of a GAL4 activatedpromoter. This system is designed so that GAL4 expression willnonlinearly self-amplify and co-amplify expression of a reporter genesuch as luciferase upon reaching a certain threshold in the cell. RSRcan be used to great advantage to evolve reporter constructs with thedesired signaling properties, as follows: (1) A single plasmid constructis made which contains both the GAL4/pheromone pathway regulated GAL4gene and the GALA regulated reporter gene. (2) This construct ismutagenized and transformed into the appropriately engineered yeaststrain expressing a 7-TM and chimeric yeast/mammalian protein ofinterest. (3) Cells are stimulated with agonists and screened (orselected) based on the activity of the reporter gene. In a preferredformat, luciferase is the reporter gene and activity is quantitatedbefore and after stimulation with the agonist, thus allowing for aquantitative measurement of signal to noise for each colony. (4) Cellswith improved reporter properties are recovered, the constructs areshuffled, and RSR is applied to further evolve the plasmid to giveoptimal signal noise characteristics.

These approaches are general and illustrate how any component of asignal transduction pathway or transcription factor could be evolvedusing RSR and the screens and selections described above. For example,these specific methods could be used to evolve 7-TM receptors withspecificity for novel ligands, specificity of nuclear receptors fornovel ligands (for example tc obtain herbicide or other smallmolecule-inducible expression of genes of interest in transgenic plants,such that a given set of genes can be induced upon treatment with agiven chemical agent), specificity of transcription factors to beresponsive to viral factors (thus inducing antiviral or lethal genes incells expressing this transcription factor [transgenics or cells treatedwith gene therapy constructs]), or specificity of transcription factorsfor activity in cancer cells (for example p53 deficient cells, thusallowing one to infect with gene therapy constructs expressingconditionally lethal genes in a tumor specific fashion).

The following examples are offered by way of illustration, not by way oflimitation.

Experimental Examples I. Evolution of BIAP

A preferred strategy to evolve BIAP is as follows. A codon usage libraryis constructed from 60-mer oligonucleotides such that the central 20bases of each oligo specifies the wild type protein, but encodes thewild-type protein sequence with degenerate codons. Preferably, very rarecodons for the prokaryotic host of choice, such as E. coli, are notused. The 20 bases at each end of the oligo use non-degenerate, butpreferred, codons in E. coli. The oligonucleotides are assembled intofull-length genes as described above. The assembled products are clonedinto an expression vector by techniques well known in the art. In someembodiments, the codon usage library is expressed with a library ofsecretory leader sequences, each of which directs the encoded BIAPprotein to the E. coli periplasm. A library of leader sequences is usedto optimize the combination of leader sequence and mutant. Examples ofleader sequences are reviewed by Schatz et al. (Ann Rev. Genet.24:215-248 (1990)). The cloned BIAP genes are expressed under thecontrol of an inducible promoter such as the arabinose promoter.Arabinose-induced colonies are screened by spraying with a substrate forBIAP, bromo-chloro-indolyl phosphate (BCIP). The bluest colonies arepicked visually and subjected to the RSR procedures described herein.

The oligonucleotides for construction of the codon usage library arelisted in Table II. The corresponding locations of these promoters isprovided in FIG. 1.

TABLE II 1. AACCCTCCAG TTCCGAACCC CATATGATGA TCACCCTGCG TAAACTGCCG 2.AACCCTCCAG TTCCGAACCC CATATGAAAA AAACCGCT 3.AACCCTCCAG TTCCGAACCC ATATACATAT GCGTGCTAAA 4.AACCCTCCAG TTCCGAACCC CATATGAAAT ACCTGCTGCC GACC 5.AACCCTCCAG TTCCGAACCC GATATACATA TGAAACAGTC 6.TGGTGTTATG TCTGCTCAGG CDATGGCDGT DGAYTTYCAY CTGGTTCCGG TTGAAGAGGA 7.GGCTGGTTTC GCTACCGTTG CDCARGCDGC DCCDAARGAY CTGGTTCCGG TTGAAGAGGA 8.CACCCCGATC GCTATCTCTT CYTTYGCDTC YACYGGYTCY CTGGTTCCGG TTGAAGAGGA 9.GCTGCTGGCT GCTCAGCCGG CDATGGCDAT GGAYATYGGY CTGGTTCCGG TTGAAGAGGA 10.TGCCGCTGCT GTTCACCCCG GTDACYAARG CDGCDCARGT DCTGGTTCCG GTTGAAGAGG A 11.CCCGGCTTTC TGGAACCGTC ARGCDGCDCA RGCDCTGGAC GTTGCTAAAA AACTGCAGCC 12.ACGTTATCCT GTTCCTGGGT GAYGGYATGG GYGTDCCDAC CGTTACCGCT ACCCGTATCC 13.AAACTGGGTC CGGAAACCCC DCTGGCDATG GAYCARTTYC CGTACGTTGC TCTGTCTAAA 14.GGTTCCGGAC TCTGCTGGTA CYGCDACYGC DTAYCTGTGC GGTGTTAAAG GTAACTACCG 15.CTGCTCGTTA CAACCAGTGC AARACYACYC GYGGYAAYGA AGTTACCTCT GTTATGAACC 16.TCTGTTGGTG TTGTTACCAC YACYCGYGTD CARCAYGCDT CTCCGGCTGG TGCTTACGCT 17.GTACTCTGAC GCTGACCTGC CDGCDGAYGC DCARATGAAC AGGTTGCCAGG CATCGCTGC 18.ACATCGACGT TATCCTGGGT GGYGGYCGYA ARTAYATGTT CCCGGTTGGT ACCCCGGACC 19.TCTGTTAACG GTGTTCGTAA RCGYAARCAR AAYCTGGTDC AGGCTTGGCA GGCTAAACAC 20.GAACCGTACC GCTCTGCTGC ARGCDGCDGA YGAYTCYTCT GTTACCCACC TGATGGGTCT 21.AATACAACGT TCAGCAGGAC CAYACYAARG AYCCDACYCT GCAGGAAATG ACCGAAGTTG 22.AACCCGCGTG GTTTCTACCT GTTYGTDGAR GGYGGYCGYA TCGACCACGG TCACCACGAC 23.GACCGAAGCT GGTATGTTCG AYAAYGCDAT YGCDAARGCT AACGAACTGA CCTCTGAACT 24.CCGCTGACCA CTCTCACGTT TTYTCYTTYG GYGGYTAYAC CCTGCGTGGT ACCTCTATCT 25.GCTCTGGACT CTAAATCTTA YACYTCYATY CTGTAYGGYA ACGGTCCGGG TTACGCTCTG 26.CGTTAACGAC TCTACCTCTG ARGAYCCDTC YTAYCARCAG CAGGCTGCTG TTCCGCAGGC 27.AAGACGTTGC TGTTTTCGCT CGYGGYCCDC ARGCDCAYCT GGTTCACGGT GTTGAAGAAG 28.ATGGCTTTCG CTGGTTGCGT DGARCCDTAY ACYGAYTGYA ACCTGCCGGC TCCGACCACC 29.TGCTCACCTG GCTGCTTMAC CDCCDCCDCT GGCDCTGCTG GCTGGTGCTA TGCTGCTCCT C 30.TTCCGCCTCT AGAGAATTCT TARTACAGRG THGGHGCCAG GAGGAGCAGC ATAGCACCAG CC 31.AAGCAGCCAG GTGAGCAGCG TCHGGRATRG ARGTHGCGGT GGTCGGAGCC GGCAGGTT 32.CGCAACCAGC GAAAGCCATG ATRTGHGCHA CRAARGTYTC TTCTTCAACA CCGTGAACCA 33.GCGAAAACAG CAACGTCTTC RCCRCCRTGR GTYTCRGAHG CCTGCGGAAC AGCAGCCTGC 34.AGAGGTAGAG TCGTTAACGT CHGGRCGRGA RCCRCCRCCC AGAGCGTAAC CCGGACCGTT 35.AAGATTTAGA GTCCAGAGCT TTRGAHGGHG CCAGRCCRAA GATAGAGGTA CCACGCAGGG 36.ACGTGAGAGT GGTCAGCGGT HACCAGRATC AGRGTRTCCA GTTCAGAGGT CAGTTCGTTA 37.GAACATACCA GCTTCGGTCA GHGCCATRTA HGCYTTRTCG TCGTGGTGAC CGTGGTCGAT 38.GGTAGAAACC ACGCGGGTTA CGRGAHACHA CRCGCAGHGC AACTTCGGTC ATTTCCTGCA 39.TCCTGCTGAA CGTTGTATTT CATRTCHGCH GGYTCRAACA GACCCATCAG GTGGGTAACA 40.CAGCAGAGCG GTACGGTTCC AHACRTAYTG HGCRCCYTGG TGTTTAGCCT GCCAAGCCTG 41.TACGAACACC GTTAACAGAA GCRTCRTCHG GRTAYTCHGG GTCCGGGGTA CCAACCGGGA 42.CCCAGGATAA CGTCGATGTC CATRTTRTTH ACCAGYTGHG CAGCGATGTC CTGGCAACCG 43.CAGGTCAGCG TCAGAGTACC ARTTRCGRTT HACRGTRTGA GCGTAAGCAC CAGCCGGAGA 44.TGGTAACAAC ACCAACAGAT TTRCCHGCYT TYTTHGCRCG GTTCATAACA GAGGTAACTT 45.CACTGGTTGT AACGAGCAGC HGCRGAHACR CCRATRGTRC GGTAGTTACC TTTAACACCG 46.ACCAGCAGAG TCCGGAACCT GRCGRTCHAC RTTRTARGTT TTAGACAGAG CAACGTACGG 47.GGGTTTCCGG ACCCAGTTTA CCRTTCATYT GRCCYTTCAG GATACGGGTA GCGGTAACGG 48.CCCAGGAACA GGATAACGTT YTTHGCHGCR GTYTGRATHG GCTGCAGTTT TTTAGCAACG 49.ACGGTTCCAG AAAGCCGGGT CTTCCTCTTC AACCGGAACC AG 50.CCTGAGCAGA CATAACACCA GCHGCHACHG CHACHGCCAG CGGCAGTTTA CGCAGGGTGA 51.ACCGGGGTGA ACAGCAGCGG CAGCAGHGCC AGHGCRATRG TRGACTGTTT CATATGTATA TC 52.GCCGGCTGAG CAGCCAGCAG CAGCAGRCCH GCHGCHGCGG TCGGCAGCAG GTAGTTTCA 53.AAGAGATAGC GATCGGGGTG GTCAGHACRA TRCCCAGCAG TTTAGCACGC ATATGTATAT 54.CAACGGTAGC GAAACCAGCC AGHGCHACHG CRATHGCRAT AGCGGTTTTT TTCATATG 55.AGAATTCTCT AGAGGCGGAA ACTCTCCAAC TCCCAGGTT 56.TGAGAGGTTG AGGGTCCAAT TGGGAGGTCA AGGCTTGGG All oligonucleotides listed5′ to 3′. The code for degenerate positions is: R: A or G; Y: C or T; H:A or C or T; D: A or G or T.

II. Mammalian Surface Display

During an immune response antibodies naturally undergo a process ofaffinity maturation resulting in mutant antibodies with improvedaffinities for their cognate antigens. This process is driven by somatichypermutation of antibody genes coupled with clonal selection (Berek andMilstein, Immun. Rev. 96:23-41 (1987)). Patten et al. (Science271:1086-1091 (1996)) have reconstructed the progression of a catalyticantibody from the germline sequence, which binds ap-nitrophenylphosphonate hapten with an affinity of 135 micromolar, tothe affinity matured sequence which has acquired nine somatic mutationsand binds with an affinity of 10 nanomolar. The affinity maturation ofthis antibody can be recapitulated and improved upon using cassettemutagenesis of the CDR's (or random mutagenesis such as with PCR),mammalian display, FACS selection for improved binding, and RSR torapidly evolve improved affinity by recombining mutations encodingimproved binding.

Genomic antibody expression shuttle vectors similar to those describedby Gascoigne et al. (Proc. Natl. Acad. Sci. (U.S.A.) 84:2936-2940(1987)) are constructed such that libraries of mutant V region exons canbe readily cloned into the shuttle vectors. The kappa construct iscloned onto a plasmid encoding puromycin resistance and the heavy chainis cloned onto a neomycin resistance encoding vector. The cDNA derivedvariable region sequences encoding the mature and germline heavy andlight chain V regions are reconfigured by PCR mutagenesis into genomicexons flanked by Sfi I sites with complementary Sfi I sites placed atthe appropriate locations in the genomic shuttle vectors. Theoligonucleotides used to create the intronic Sfi I sites flanking theVDJ exon are: 5′ Sfi I: 5′-TTCCATTTCA TACATGGCCG AAGGGGCCGT GCCATGAGGATTTT-3′; 3′ Sfi I: 5′-TTCTAAATG CATGTTGGCC TCCTTGGCCG GATTCTGAGCCTTCAGGACC A-3′. Standard PCR mutagenesis protocols are applied toproduce libraries of mutants wherein the following sets of residues(numbered according to Kabat, Sequences of Proteins of ImmunologicalInterest, U.S. Dept of Health and Human Services, 1991) are randomizedto NNK codons (GATC, GATC, GC):

Chain CDR Mutated residues V-L 1 30, 31, 34 V-L 2 52, 53, 55 V-H 2 55,56, 65 V-H “4” 74, 76, 78

Stable transfectant lines are made for each of the two light and heavychain constructs (mature and germline) using the B cell myeloma AG8-653(a gift from J. Kearney) as a host using standard electroporationprotocols. Libraries of mutant plasmids encoding the indicated librariesof V-L mutants are transfected into the stable transformant expressingthe germline V-H; and the V-H mutants are transfected into the germlineV-L stable transfectant line. In both cases, the libraries areintroduced by protoplast fusion (Sambrook et al., Molecular Cloning, CSHPress (1987)) to ensure that the majority of transfected cells receiveone and only one mutant plasmid sequence (which would not be the casefor electroporation where the majority of the transfected cells wouldreceive many plasmids, each expressing a different mutant sequence).

The p-nitrophenylphosphonate hapten (JWJ-1) recognized by this antibodyis synthesized as described by Patten et al. (Science 271:1086-1091(1996)). JWJ-1 is coupled directly to5-(((2-aminoethyl)thio)acetyl)fluorescein (Molecular Probes, Inc.) byformation of an amide bond using a standard coupling chemistry such asEDAC (March, Advanced Organic Chemistry, Third edition, John Wiley andSons, 1985) to give a monomeric JWJ-1-FITC probe. A “dimeric” conjugate(two molecules of JWJ-1 coupled to a FACS marker) is made in order toget a higher avidity probe, thus making low affinity interactions (suchas with the germline antibody) more readily detected by FACS. This isgenerated by staining with Texas Red conjugated to an anti-fluoresceinantibody in the presence of two equivalents of JWJ-1-FITC. The bivalentstructure of IgG then provides a homogeneous bivalent reagent. A spincolumn is used to remove excess JWJ-1-FITC molecules that are not boundto the anti-FITC reagent. A tetravalent reagent is made as follows. Oneequivalent of biotin is coupled with EDAC to two equivalents ofethylenediamine, and this is then be coupled to the free carboxylate onJWJ-1. The biotinylated JWJ-1 product is purified by ion exchangechromatography and characterized by mass spectrometry. FITC labelledavidin is incubated with the biotinylated JWJ-1 in order to generate atetravalent probe.

The FACS selection is performed as follows, according to a protocolsimilar to that of Panka et al. (Proc. Natl. Acad. Sci. (U.S.A.)85:3080-3084 (1988)). After transfection of libraries of mutant antibodygenes by the method of protoplast fusion (with recovery for 36-72hours), the cells are incubated on ice with fluorescently labelledhapten. The incubation is done on ice to minimize pinocytosis of theFITC conjugate which may contribute to nonspecific background. The cellsare then sorted on the FACS either with or without a washing step.FACSing without a washing step is preferable because the off rate forthe germline antibody prior to affinity maturation is expected to bevery fast (>0.1 sec-1; Patten et al., Science 271:1086-1091 (1996)); awashing step adds a complicating variable. The brightest 0.1-10% of thecells are collected.

Four parameters are manipulated to optimize the selection for increasedbinding: monomeric vs dimeric vs tetrameric hapten, concentration ofhapten used in the staining reaction (low concentration selects for highaffinity Kd's), time between washing and FACS (longer time selects forlow off rates), and selectivity in the gating (i.e. take the top 0.1% to10%, more preferably the top 0.1%). The constructs expressing thegermline, mature, and both combinations of half germline are used ascontrols to optimize this selectivity.

Plasmids are recovered from the FACS selected cells by thetransformation of an E. coli host with Hirt supernatants. Alternatively,the mutant V gene exons are PCR-amplified from the FACS selected cells.The recovered V gene exons are subjected to RSR, recloned into thecorresponding genomic shuttle vector, and the procedure recursivelyapplied until the mean fluorescence intensity has increased. A relevantpositive control for improved binding is transfection with the affinitymatured 48G7 exons (Patten et al., op. cit.).

In a further experiment, equal numbers of germline and each of the twohalf germline transfectants are mixed. The brightest cells are selectedunder conditions described above. The V genes are recovered by PCR,recloned into expression vectors, and co-transfected, either twoplasmids per E. coli followed by protoplast fusion, or by bulkelectroporation. The mean fluorescent intensity of the transfectantsshould increase due to enrichment of mature relative to germline Vregions.

This methodology can be applied to evolve any receptor-ligand or bindingpartner interaction. Natural expression formats can be used to expresslibraries of mutants of any receptor for which one wants to improve theaffinity for the natural or novel ligands. Typical examples would beimprovement of the affinity of T cell receptors for ligands of interest(i.e. MHC/tumor peptide antigen complexes) or TNF receptor for TNF(soluble forms of TNF receptors are used therapeutically to neutralizeTNF activity).

This format can also be used to select for mutant forms of ligands byexpressing the ligand in a membrane bound form with an engineeredmembrane anchor by a strategy analogous to that of Wettstein et al. (J.Exp. Med. 174:219-28 (1991)). FACS selection is then performed withfluorescently labelled receptor. In this format one could, for example,evolve improved receptor antagonists from naturally occurring receptorantagonists (IL1 receptor antagonist, for example). Mutant forms ofagonists with improved affinity for their cognate receptors could alsobe evolved in this format. These mutants would be candidates forimproved agonists or potent receptor antagonists, analogous to reportedantagonistic mutant forms of IL3.

III. Evolution of Alpha Interferon

There are at hand 18 known non-allelic human interferon-alpha (INF-α)genes, with highly related primary structures (78-95% identical) andwith a broad range of biological activities. Many hybrid interferonswith interesting biological activities differing from the parentalmolecules have been described (reviewed by Horisberger and Di Marco,Pharm. Ther. 66:507-534 (1995)). A consensus human alpha interferon,IFN-Conl, has been constructed synthetically wherein the most commonresidue in fourteen known IFN-α's has been put at each position, and itcompares favorably with the naturally occurring interferons (Ozes etal., J. Interferon Res. 12:55-59 (1992)). This IFN contains 20 aminoacid changes relative to IFN-α2a, the INF-α to which it is most closelyrelated. IFN-Conl has 10-fold higher specific antiviral activity thanany known natural IFN subtype. IFN-α Conl has in vitro activities 10 to20 fold higher than that of recombinant IFN α-2a (the major IFN usedclinically) in antiviral, antiproliferative and NK cell activation.Thus, there is considerable interest in producing interferon hybridswhich combine the most desirable traits from two or more interferons.However, given the enormous number of potential hybrids and the lack ofa crystal structure of IFN-α or of the IFN-α receptor, there is aperceived impasse in the development of novel hybrids (Horisberger andDi Marco, Pharm. Ther. 66:507-534 (1995)).

The biological effects of IFN-α's are diverse, and include suchproperties as induction of antiviral state (induction of factors thatarrest translation and degrade mRNA); inhibition of cell growth;induction of Class I and Class II MHC; activation of monocytes andmacrophages; activation of natural killer cells; activation of cytotoxicT cells; modulation of Ig synthesis in B cells; and pyrogenic activity.

The various IFN-α's subtypes have unique spectra of activities ondifferent target cells and unique side effect profiles (Ortaldo et al.,Proc. Natl. Acad. Sci. (U.S.A.) 81:4926-4929 (1984); Overall et al., J.Interferon Res. 12:281-288 (1992); Fish and Stabbing, Biochem. Biophys.Res. Comm. 112:537-546 (1983); Weck et al., J. Gen. Virol. 57:233-237(1981)). For example, human IFNα has very mild side effects but lowantiviral activity. Human IFNαB has very high antiviral activity, butrelatively severe side effects. Human IFNα7 lacks NK activity and blocksNK stimulation by other INFα's. Human IFN-α J lacks the ability tostimulate NK cells, but it can bind to the IFN-α receptor on NK cellsand block the stimulatory activity of IFN-αA (Langer et al., J.Interferon Res. 6:97-105 (1986)).

The therapeutic applications of interferons are limited by diverse andsevere side effect profiles which include flu-like symptoms, fatigue,neurological disorders including hallucination, fever, hepatic enzymeelevation, and leukopenia. The multiplicity of effects of IFN-α's hasstimulated the hypothesis that there may be more than one receptor or amulticomponent receptor for the IFNα family (R. Hu et al., J. Biol.Chem. 268:12591-12595 (1993)). Thus, the existence of abundant naturallyoccurring diversity within the human alpha IFN's (and hence a largesequence space of recombinants) along with the complexity of the IFN-αreceptors and activities creates an opportunity for the construction ofsuperior hybrids.

A. Complexity of the Sequence Space

FIG. 2 shows the protein sequences of 11 human

IFN-α's. The differences from consensus are indicated. Those positionswhere a degenerate codon can capture all of the diversity are indicatedwith an asterisk. Examination of the aligned sequences reveals thatthere are 57 positions with two, 15 positions with three, and 4positions with four possible amino acids encoded in this group of alphainterferon genes. Thus, the potential diversity encoded by permutationof all of this naturally occurring diversity is: 2^(57×3) ^(15×4)⁴=5.3×10²⁶. Among these hybrids, of the 76 polymorphisms spread over atotal of 175 sites in the 11 interferon genes, 171 of the 175 changescan be incorporated into homologue libraries using single degeneratecodons at the corresponding positions. For example, Arg, Trp and Gly canall be encoded by the degenerate codon [A,T,G]GG. Using such a strategy,1.3×10²⁵ hybrids can be captured with a single set of degenerateoligonucleotides. As is evident from Tables III to VI, 27oligonucleotides is sufficient to shuffle all eleven human alphainterferons. Virtually all of the natural diversity is thereby encodedand fully permuted due to degeneracies in the nine “block”oligonucleotides in Table V.

B. Properties of a “Coarse Grain” Search of Homologue Sequence Space

The modeled structure of IFN alpha (Kontsek, Acta Vir. 38:345-360(1994)) has been divided into nine segments based on a combination ofcriteria of maintaining secondary structure elements as single units andplacing/choosing placement of the segment boundaries in regions of highidentity. Hence, one can capture the whole family with a single set ofmildly degenerate oligonucleotides. Table III and FIG. 2 give theprecise locations of these boundaries at the protein and DNA levelsrespectively. It should be emphasized that this particular segmentationscheme is arbitrary and that other segmentation schemes could also bepursued. The general strategy does not depend on placement ofrecombination boundaries at regions of high identity between the familymembers or on any particular algorithm for breaking the structure intosegments.

TABLE III Segmentation Scheme for Alpha Interferon # Permutations of allSegment Amino Acids # Alleles Sequence Variations 1  1-21 5 1024 2 22-5110 6.2 × 10⁴ 3 52-67 6 96 4 68-80 7 1024 5 81-92 7 192 6  93-115 10 2.5× 10⁵ 7 116-131 4 8 8 132-138 4 8 9 139-167 9 9216

Many of the IFN's are identical over some of the segments, and thusthere are less than eleven different “alleles” of each segment. Thus, alibrary consisting of the permutations of the segment “alleles” wouldhave a potential complexity of 2.1×10⁷ (5 segment #1's times 10 segment#2's× . . . ×9 segment #9's). This is far more than can be examined inmost of the screening procedures described, and thus this is a goodproblem for using RSR to search the sequence space.

C. Detailed Strategies for Using RSR to Search the IFN-Alpha HomologueSequence Space

The methods described herein for oligo directed shuffling (i.e. bridgeoligonucleotides) are employed to construct libraries of interferonalpha hybrids, and the general methods described above are employed toscreen or select these mutants for improved function. As there arenumerous formats in which to screen or select for improved interferonactivity, many of which depend on the unique properties of interferons,exemplary descriptions of IFN based assays are described below.

D. A Protocol for a Coarse Grain Search of Hybrid IFN Alpha SequenceSpace

In brief, libraries are constructed wherein the 11 homologous forms ofthe nine segments are permuted (note that in many cases two homologuesare identical over a given segment). All nine segments are PCR-amplifiedout of all eleven IFN alpha genes with the eighteen oligonucleotideslisted in Table IV, and reassembled into full length genes with oligodirected recombination. An arbitrary number, e.g., 1000, clones from thelibrary are prepared in a 96-well expression/purification format.Hybrids with the most potent antiviral activities are screened. Nucleicacid is recovered by PCR amplification, and subjected to recombinationusing bridge oligonucleotides. These steps are repeated until candidateswith desired properties are obtained.

E. Strategies for Examining the Space of >10²⁶ Fine Grain Hybrids

In brief, each of the nine segments is synthesized with one degenerateoligo per segment. Degeneracies are chosen to capture all of theIFN-alpha diversity that can be captured with a single degenerate codonwithout adding any non-natural sequence. A second set of degenerateoligonucleotides encoding the nine segments is generated wherein all ofthe natural diversity is captured, but additional non-natural mutationsare included at positions where necessitated by the constraints of thegenetic code. In most cases all of the diversity can be captured with asingle degenerate codon; in some cases a degenerate codon will captureall of the natural diversity but will add one non-natural mutation; at afew positions it is not possible to capture the natural diversitywithout putting in a highly degenerate codon which will create more thanone non-natural mutation. It is at these positions that this second setof oligonucleotides will differ from the first set by being moreinclusive. Each of the nine synthetic segments is then amplified by PCRwith the 18 PCR oligonucleotides. Full length genes using the oligodirected recombination method are generated, transfected into a host,and assayed for hybrids with desired properties. The best hybrids from(e.g, the top 10%, 1% or 0.1%; preferably the top 1%) are subjected toRSR and the process repeated until a candidate with the desiredproperties is obtained.

F. “Non-Gentle” Fine Grain Search

On the one hand, one could make libraries wherein each segment isderived from the degenerate synthetic oligonucleotides which will encoderandom permutations of the homologue diversity. In this case, theinitial library will very sparsely search the space of >10²⁵ possiblefine grain hybrids that are possible with this family of genes. Onecould proceed by breeding positives together from this search. However,there would be a large number of differences between independent membersof such libraries, and consequently the breeding process would not bevery “gentle” because pools of relatively divergent genes would berecombined at each step.

G. “Gentle” Fine Grain Search

One way to make this approach more “gentle” would be to obtain acandidate starting point and to gently search from there. This startingpoint could be either one of the natural IFN-alpha's (such as IFNalpha-2a which is the one that is being used most widelytherapeutically), the characterized IFN-Conl consensus interferon, or ahit from screening the shuffled IFN-alpha's described above. Given astarting point, one would make separate libraries wherein one breeds thedegenerate segment libraries one at a time into the founder sequence.Improved hits from each library would then be bred together to gentlybuild up mutations all throughout the molecule.

H. Functional Cellular Assays

The following assays, well known in the art, are used to screen IFNalpha mutants: inhibition of viral killing; standard error of 30-50%;inhibition of plaque forming units; very low standard error (can measuresmall effects); reduced viral yield (useful for nonlethal, nonplaqueforming viruses); inhibition of cell growth (3H-thymidine uptake assay;activation of NK cells to kill tumor cells; suppression of tumorformation by human INF administered to nude mice engrafted with humantumors (skin tumors for example).

Most of these assays are amenable to high throughput screening.Libraries of recombinant IFN alpha mutants are expressed and purified inhigh throughput formats such as expression, lysis and purification in a96-well format using anti-IFN antibodies or an epitope tag and affinityresin. The purified IFN preparations are screened in a high throughputformat, scored, and the mutants encoding the highest activities ofinterest are subjected to further mutagenesis, such as RSR, and theprocess repeated until a desired level of activity is obtained.

I. Phage Display

Standard phage display formats are used to display biologically activeIFN. Libraries of chimeric IFN genes are expressed in this format andare selected (positively or negatively) for binding (or reduced binding)to one or more purified IFN receptor preparations or to one or more IFNreceptor expressing cell types.

J. GFP or Luciferase Under Control of IFN-Alpha Dependent Promoter

Protein expressed by mutants can be screened in high throughput formaton a reporter cell line which expresses GFP or luciferase under thecontrol of an IFN alpha responsive promoter, such as an MHC Class Ipromoter driving GFP expression.

K. Stimulation of Target Cells with Intact Infections Particles

Purification of active IFN will limit the throughput of the assaysdescribed above. Expression of active IFN alpha on filamentous phage M13would allow one to obtain homogenous preparations of IFN mutants in aformat where thousands or tens of thousands of mutants could readily behandled. Gram et al. (J. Imm. Meth. 161:169-176 (1993)) havedemonstrated that human IL3, a cytokine with a protein fold similar intopology to IFN alpha, can be expressed on the surface of M13 and thatthe resultant phage can present active IL3 to IL3 dependent cell lines.Similarly, Saggio et al. (Gene 152:35-39 (1995)) have shown that humanciliary neurotrophic factor, a four helix bundle cytokine, isbiologically active when expressed on phage at concentrations similar tothose of the soluble cytokine. Analogously, libraries of IFN alphamutants on M13 can be expressed and lysates of defined titre used topresent biologically active IFN in the high throughput assays andselections described herein.

The following calculation supports the feasibility of applying thistechnology to IFN alpha. Assuming (1) titres of 1×10¹⁰ phage/ml withfive active copies of interferon displayed per phage, and (2) that thedisplayed interferon is equivalently active to soluble recombinantinterferon (it may well be more potent due to multi-valency), thequestion then is whether one can reasonably expect to see biologicalactivity.

(1×10¹⁰ phage/ml)×(5 IFN molecules/phage)×(1 mole/6×10²³molecules)×(26,000 gm/mole)×(10⁹ ng/gm)=2.2 ng/ml

The range of concentration used in biological assays is: 1 ng/ml for NKactivation, 0.1-10 ng/ml for antiproliferative activity on Eskol cells,and 0.1-1 ng/ml on Daudi cells (Ozes et al., J. Interferon Res. 12:55-59(1992)). Although some subtypes are glycosylated, interferon alpha2a andconsensus interferon are expressed in active recombinant form in E.coli, so at least these two do not require glycosylation for activity.Thus, IFN alpha expressed on filamentous phage is likely to bebiologically active as phage lysates without further purification.Libraries of IFN chimeras are expressed in phage display formats andscored in the assays described above and below to identify mutants withimproved properties to be put into further rounds of RSR.

When one phage is sufficient to activate one cell due to the highvalency state of the displayed protein (five per phage in the gene IIIformat; hundreds per phage in the gene VIII format; tens in the lambdagene V format), then a phage lysate can be used directly at suitabledilution to stimulate cells with a GFP reporter construct under thecontrol of an IFN responsive promoter. Assuming that the phage remainattached after stimulation, expression and FACS purification of theresponsive cells, one could then directly FACS purify hybrids withimproved activity from very large libraries (up to and perhaps largerthan 10⁷ phage per FACS run).

A second way in which FACS is used to advantage in this format is thefollowing. Cells can be stimulated in a multiwell format with one lysateper well and a GFP type reporter construct. All stimulated cells areFACS purified to collect the brightest cells, and the IFN genesrecovered and subjected to RSR, with iteration of the protocol until thedesired level of improvement is obtained. In this protocol thestimulation is performed with individual concentrated lysates and hencethe requirement that a single phage be sufficient to stimulate the cellis relaxed. Furthermore, one can gate to collect the brightest cellswhich, in turn, should have the most potent phage attached to them.

L. Cell Surface Display Protocol for IFN Alpha Mutants

A sample protocol follows for the cell surface display of IFN alphamutants. This form of display has at least two advantages over phagedisplay. First, the protein is displayed by a eukaryotic cell and hencecan be expressed in a properly glycosylated form which may be necessaryfor some IFN alphas (and other growth factors). Secondly, it is a veryhigh valency display format and is preferred in detecting activity fromvery weakly active mutants.

In brief, a library of mutant IFN's is constructed wherein a polypeptidesignal for addition of a phosphoinositol tail has been fused to thecarboxyl terminus, thus targeting the protein for surface expression(Wettstein et al., J. Exp. Med. 174:219-28 (1991)). The library is usedto transfect reporter cells described above (luciferase reporter gene)in a microtiter format. Positives are detected with a charge couplingdevice (CCD) camera. Nucleic acids are recovered either by HIRT andretransformation of the host or by PCR, and are subjected to RSR forfurther evolution.

M. Autocrine Display Protocol for Viral Resistance

A sample protocol follows for the autocrine display of IFN alphamutants. In brief, a library of IFN mutants is generated in a vectorwhich allows for induction of expression (i.e. metallothionein promoter)and efficient secretion. The recipient cell line carrying an IFNresponsive reporter cassette [GFP or luciferase] is induced bytransfection with the mutant IF1 constructs. Mutants which stimulate theIFN responsive promoter are detected by by FACS or CCD camera.

A variation on this format is to challenge transfectant with virus andselect for survivors. One could do multiple round of viral challenge andoutgrowth on each set of transfectants prior to retrieving the genes.Multiple rounds of killing and outgrowth allow an exponentialamplification of a small advantage and hence provide an advantage indetecting small improvements in viral killing.

TABLE IV Oligonucleotides needed for blockwise recombination: 18 Oligonucleotides  for alpha interferon shuffling 1.5′-TGT[G/A]ATCTG[C/T]CT[C/G]AGACC 2.5′-GGCACAAATG[G/A/C]G[A/C]AGAATCTCTC 3.5′-AGAGATTCT[G/T]C[C/T/G]CATTTGTGCC 4.5′-CAGTTCCAGAAG[A/G]CT[G/C][C/A]AGCCATC 5.5′-GATGGCT[T/G][G/C]AG[T/C]CTTCTGGAACTG 6. 5′-CTTCAATCTCTTCA[G/C]CACA 7.5′-TGTG[G/C]TGAAGAGATTGAAG 8. 5′-GGA[C/G]CTCCTAGA 9.5′-TCTAGGAG[G/C][G/C]TCT[G/C][T/A]TCC 10.5′-GAACTT[T/G/A][T/A]CCAGCA[A/C]TGAAT 11.5′-ATTCA[T/G]TTGCTGG[A/T][A/T/C]AAGTTC 12. 5′-GGACT[T/C]CATCCTGGCTGTG13. 5′-CACAGCCAGGATG[G/A]AGTCC 14. 5′-AAGAATCACTCTTTATCT 15.5′-AGATAAAGAGTGATTCTT 16. 5′-TGGGAGGTTGTCAGAGCAG 17.5′-CTGCTCTGACAACCTCCCA 18. 5′-TCA[A/T]TCCTT[C/A]CTC[T/C]TTAA

Brackets indicate degeneracy with equal mixture of the specified basesat those positions. The purpose of the degeneracy is to allow this oneset of primers to prime all members of the IFN family with similarefficiency. The choice of the oligo driven recombination points isimportant because they will get “overwritten” in each cycle of breedingand hence cannot coevolve with the rest of the sequence over many cyclesof selection.

TABLE V Oligonucleotides needed for “fine grain” recombination ofnatural diversity over each of the nine blocks Block #Length of oligorequired 1 76 2 95 3 65 4 56 5 51 6 93 7 50 8 62 9 80

TABLE VI Amino acids that can be reached by a single step mutation inthe codon of interest. Wild-Type Amino Amino acids reachable by one Acidmutation W C, R, G, L Y F, S. C, H, N, D F L, I, V, S, Y, C L S, W, F,I, M, V, P V F, L, I, M, A, D, E, G I F, L, M, V, T, N, K, S, R A S, P,T, V, D, E, G G V, A, D, E, R, S, C, W M L, I, V, T, K, R S F, L, Y, C,W, P, T, A, R, G, N, T, I T S, P, A, I, M, N, K, S, R P S, T, A, L, H,Q, R C F, S, Y, R, G, W N Y, H, K, D, S, T, I Q Y, H, K, E, L, P, R H Y,Q, N, D, L, P, R D Y, H, N, E, V, A, G E Q, K, D, V, A, G R L, P, H, Q,C, W, S, G, K, T, I, M K Q, N, E, R, T, I, M

Based on this Table, the polymorphic positions in IFN alpha where all ofthe diversity can be captured by a degenerate codon have beenidentified. Oligonucleotides of the length indicated in Table V abovewith the degeneracies inferred from Table VI are synthesized.

Although the foregoing invention has been described in some detail byway of illustration and example for purposes of clarity ofunderstanding, it will be obvious that certain changes and modificationsmay be practiced within the scope of the appended claims.

All references cited herein are expressly incorporated in their entiretyfor all purposes.

1. A method for evolving a protein encoded by a DNA substrate moleculecomprising: (a) digesting at least a first and second DNA substratemolecule, wherein the at least a first and second substrate moleculesdiffer from each other in at least one nucleotide, with a restrictionendonuclease; (b) ligating the mixture to generate a library ofrecombinant DNA molecules; (c) screening or selecting the products of(b) for a desired property; and (d) recovering a recombinant DNAsubstrate molecule encoding an evolved protein. 2-273. (canceled)